Java Mailing List Archive

http://www.junlu.com/

Home » Home (12/2007) » iText »

Re: [iText-questions] Slightly OT: Does anyone have a way to
 determine the language or even codepage of a PDF?

Aaron J Weber

2006-07-20

Replies:

Excuse me again...
 
I appreciate your correspondence on the matter, but I don't understand your last comment.
 
My point is that if the file is solely a PDF/Image (as I have found examples of), then there are no fonts listed in the PDF at all (as you correctly stated).
 
-- Your initial reply was a recommendation to filter by fonts.
-- Yet you agree that there may not be ANY fonts, so that's not an accurate determination (my reply).
-- Then your last comment telling me I should read the PDF Reference [Manual]...this will presumably hold the key to determining the codepage/language of a document?
 
The "puzzle" is still in the subject of the email.  Are you saying you know the answer, but do not want to divulge it, and I should RTFM, because it's most certainly in there?  Understanding all the permutations of how a PDF can be made-up is going to have an answer to my question?  Even when all I have is Image [streams] on each page?
 
Sorry, but I'm still searching for an answer, and I just wanted to clarify your "Sage" advice.
 
Thanks again for your time.
-AJ
 
----- Original Message -----
From: Leonard Rosenthol
To: Aaron J Weber ; itext-questions@lists.sourceforge.net
Sent: Thursday, July 20, 2006 10:02 AM
Subject: Re: [iText-questions] Slightly OT: Does anyone have a way to determine the language or even codepage of a PDF?

At 09:51 AM 7/20/2006, Aaron J Weber wrote:
Thanks for the suggestion.  I had thought about that.  But what if the document is PDF-Image (doesn't have a significant text "layer")? 

        Then you won't find any fonts in the document.   (keep in mind that a PDF isn't a single thing - each object on each page is unique).


Then I'm just going to have a lot of binary stream data in there with very little (if any) notation of fonts, right?

        Nope - you won't be bothering with the images.


Puzzling stuff... :(

        Not really if you read the PDF Reference to better understand how a PDF is made up.


Leonard

---------------------------------------------------------------------------
Leonard Rosenthol                            < mailto:leonardr@pdfsages.com>
Chief Technical Officer                      < http://www.pdfsages.com>
PDF Sages, Inc.                              215-938-7080 (voice)
                                             215-938-0880 (fax)
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
iText-questions mailing list
iText-questions@(protected)
https://lists.sourceforge.net/lists/listinfo/itext-questions
©2008 junlu.com - Jax Systems, LLC, U.S.A.