At 02:18 PM 7/20/2006, Aaron J Weber wrote:
My point is that if the file is
solely a PDF/Image (as I have found examples of), then there are no fonts
listed in the PDF at all (as you correctly stated).
If a PDF
consists of a collection of pages, where each page contains a single
image - then yes, there will be no fonts. That PDF doesn't have any
official nomenclature, though some folks choose to refer to it as an
"ImageOnly PDF".
-- Your initial reply was a
recommendation to filter by fonts.
-- Yet you agree that there may not be ANY fonts, so that's not an
accurate determination (my reply).
So if no
fonts, then there is no text in the document, and you can't report on any
language...Right?
If you are
thinking that you need to figure out what is in the images - then that is
going to require a FIRST PASS with some OCR application to convert
image->text...and THEN you can run your font filtering...
Leonard
---------------------------------------------------------------------------
PDF Sages,
Inc.
215-938-7080 (voice)
215-938-0880 (fax)