  | Mailing List | | Home | | Forum Home | | JBoss - Java Application Server | | Tomcat - JSP/Servlet container | | Struts - A MVC web framework | | iText - An open source PDF Java Library | | JDOM - JDOM XML Parser | | J2EE - A mailing list for Java(tm) 2 Platform, Enterprise Edition | | J2EE Pattern - An interest list for Sun Java Center J2EE Pattern Catalog | | JSP - A mailing list about Java Server Pages specification and reference | | Servlet - A mailing list for discussion about Sun Microsystem's Java Servlet API Technology | |
Struts & Hibernate
|
|
|
  | | | Slightly OT: Does anyone have a way to determine the langu | Slightly OT: Does anyone have a way to determine the langu 2006-07-20 - By Leonard Rosenthol
Back At 10:40 PM 7/19/2006, Aaron J Weber wrote: >I basically am trying to filter PDFs to see if they're a >non-Latin-based language (Japanese, Korean, Chinese to name a few). > >Thanks for any hints/tips/suggestions. >
If I were trying to tackle this problem, I would simply find all fonts in the document and examine their glyph sets and encodings to find any non-Roman ones. Looking at the fonts would only tell you about POSSIBLE usage - you'd need to examine the actual contents to determine REAL usage.
Leonard
-- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- -- Leonard Rosenthol <mailto:leonardr@(protected)> Chief Technical Officer <http://www.pdfsages.com> PDF Sages, Inc. 215-938-7080 (voice) 215-938-0880 (fax)
<html> <body> At 10:40 PM 7/19/2006, Aaron J Weber wrote:<br> <blockquote type=cite class=cite cite=""> <font face="Arial, Helvetica" size=2>I basically am trying to filter PDFs to see if they're a non-Latin-based language (Japanese, Korean, Chinese to name a few).<br> </font> <br> <font face="Arial, Helvetica" size=2>Thanks for any hints/tips/suggestions.<br> </font> </blockquote><br> <x-tab> </x-tab>If I were trying to tackle this problem, I would simply find all fonts in the document and examine their glyph sets and encodings to find any non-Roman ones. Looking at the fonts would only tell you about POSSIBLE usage - you'd need to examine the actual contents to determine REAL usage.<br><br> <br> <font face="Arial, Helvetica" size=2>Leonard<br> </font></body> <br> <div> -- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- --< /div> <div>Leonard Rosenthol <<a href="mailto:leonardr@(protected)" EUDORA=AUTOURL> mailto:leonardr@(protected)</a>></div> <div>Chief Technical Officer <<a href="http://www.pdfsages.com" EUDORA=AUTOURL> http://www.pdfsages.com</a>></div> <div>PDF Sages, Inc. 215-938-7080 (voice)</div> <div> 215-938-0880 (fax)</div> </html>
-- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV __ ____ ____ ____ ____ ____ ____ ____ ____ ____ iText-questions mailing list iText-questions@(protected) https://lists.sourceforge.net/lists/listinfo/itext-questions
|
|
 |