Excuse me again...
I appreciate your correspondence on the matter, but
I don't understand your last comment.
My point is that if the file is solely a
PDF/Image (as I have found examples of), then there are no fonts listed in the
PDF at all (as you correctly stated).
-- Your initial reply was a recommendation to
filter by fonts.
-- Yet you agree that there may not be ANY fonts,
so that's not an accurate determination (my reply).
-- Then your last comment telling me I should read
the PDF Reference [Manual]...this will presumably hold the key to determining
the codepage/language of a document?
The "puzzle" is still in the subject of the
email. Are you saying you know the answer, but do not want to divulge it,
and I should RTFM, because it's most certainly in there? Understanding all
the permutations of how a PDF can be made-up is going to have an answer to my
question? Even when all I have is Image [streams] on each
page?
Sorry, but I'm still searching for an answer, and I
just wanted to clarify your "Sage" advice.
Thanks again for your time.
-AJ
----- Original Message -----
Sent: Thursday, July 20, 2006 10:02
AM
Subject: Re: [iText-questions] Slightly
OT: Does anyone have a way to determine the language or even codepage of a
PDF?
At 09:51 AM 7/20/2006, Aaron J Weber wrote:
Thanks for the suggestion. I had thought about that. But
what if the document is PDF-Image (doesn't have a significant text
"layer")?
Then
you won't find any fonts in the document. (keep in mind that a PDF
isn't a single thing - each object on each page is unique).
Then I'm just going to have a lot of binary stream data in there with
very little (if any) notation of fonts,
right?
Nope
- you won't be bothering with the images.
Puzzling stuff...
:(
Not
really if you read the PDF Reference to better understand how a PDF is made
up.
Leonard
---------------------------------------------------------------------------
PDF Sages,
Inc.
215-938-7080 (voice)
215-938-0880 (fax)