Java Mailing List Archive

http://www.junlu.com/

Home » Home (12/2007) » iText »

Re: [iText-questions] Slightly OT: Does anyone have a way to
determine the language or even codepage of a PDF?

Leonard Rosenthol

2006-07-20

Replies:

At 02:18 PM 7/20/2006, Aaron J Weber wrote:
My point is that if the file is solely a PDF/Image (as I have found examples of), then there are no fonts listed in the PDF at all (as you correctly stated).
 

        If a PDF consists of a collection of pages, where each page contains a single image - then yes, there will be no fonts.  That PDF doesn't have any official nomenclature, though some folks choose to refer to it as an "ImageOnly PDF".


-- Your initial reply was a recommendation to filter by fonts.
-- Yet you agree that there may not be ANY fonts, so that's not an accurate determination (my reply).

        So if no fonts, then there is no text in the document, and you can't report on any language...Right?

        If you are thinking that you need to figure out what is in the images - then that is going to require a FIRST PASS with some OCR application to convert image->text...and THEN you can run your font filtering...


Leonard

---------------------------------------------------------------------------
Leonard Rosenthol                            < mailto:leonardr@pdfsages.com>
Chief Technical Officer                      < http://www.pdfsages.com>
PDF Sages, Inc.                              215-938-7080 (voice)
                                             215-938-0880 (fax)
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
iText-questions mailing list
iText-questions@(protected)
https://lists.sourceforge.net/lists/listinfo/itext-questions
©2008 junlu.com - Jax Systems, LLC, U.S.A.