Java Mailing List Archive

http://www.junlu.com/

Home » Home (12/2007) » iText »

Re: [iText-questions] Slightly OT: Does anyone have a way to
 determine the language or even codepage of a PDF?

Aaron J Weber

2006-07-20

Replies:

Thanks for the suggestion.  I had thought about that.  But what if the document is PDF-Image (doesn't have a significant text "layer")?  Then I'm just going to have a lot of binary stream data in there with very little (if any) notation of fonts, right?
 
Puzzling stuff... :(
 
Thanks again,
AJ
 
----- Original Message -----
From: Leonard Rosenthol
To: Aaron J Weber ; Post all your questions about iText here ; itext-questions@lists.sourceforge.net
Sent: Thursday, July 20, 2006 8:43 AM
Subject: Re: [iText-questions] Slightly OT: Does anyone have a way to determine the language or even codepage of a PDF?

At 10:40 PM 7/19/2006, Aaron J Weber wrote:
I basically am trying to filter PDFs to see if they're a non-Latin-based language (Japanese, Korean, Chinese to name a few).
 
Thanks for any hints/tips/suggestions.
 

        If I were trying to tackle this problem, I would simply find all fonts in the document and examine their glyph sets and encodings to find any non-Roman ones.  Looking at the fonts would only tell you about POSSIBLE usage - you'd need to examine the actual contents to determine REAL usage.


Leonard

---------------------------------------------------------------------------
Leonard Rosenthol                            < mailto:leonardr@pdfsages.com>
Chief Technical Officer                      < http://www.pdfsages.com>
PDF Sages, Inc.                              215-938-7080 (voice)
                                             215-938-0880 (fax)
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
iText-questions mailing list
iText-questions@(protected)
https://lists.sourceforge.net/lists/listinfo/itext-questions
©2008 junlu.com - Jax Systems, LLC, U.S.A.