  | Mailing List | | Home | | Forum Home | | JBoss - Java Application Server | | Tomcat - JSP/Servlet container | | Struts - A MVC web framework | | iText - An open source PDF Java Library | | JDOM - JDOM XML Parser | | JSP - A mailing list about Java Server Pages specification and reference | | J2EE - A mailing list for Java(tm) 2 Platform, Enterprise Edition | | J2EE Pattern - An interest list for Sun Java Center J2EE Pattern Catalog | | Servlet - A mailing list for discussion about Sun Microsystem's Java Servlet API Technology | |
Struts & Hibernate
|
|
|
  | | | -none- | -none- 2007-10-06 - By wasegraves@(protected)
Back Yes; but it is not practicable with iText. You could, however, as long as the PDF is printable, use the following procedure: 1. Print to a PS file. 2. Scan the PS file from step1 above, dropping all lines that do not end with Tj or TJ. 3. Use a regular expression (together with Substitution or Match) to extract the instances of "text fragment" from within multiple instances of " (text fragment)Tj", printing the resulting text fragments to STDOUT. Bruno has given an excellent example of why you should not expect the resulting output to make sense, i.e., the text fragments may not appear in the order in which you'd like for them to appear. Cheers, Bill Segraves
-- ---- ------ Original message from krammark <wenwen_829@(protected)>: -- ---- -- ----
> > so , how we read the data from pdf ? > i mean , can we read them line by line from the specific pages ? > > thanks buddy. > > > Bruno Lowagie (iText) wrote: > > > > krammark wrote: > >> hey gusy, > >> do u guys have a idea how to read the data from pdf pages using itext ? > >> basically, i want to read the data from table and write them into excel > >> files. > >> is that possible ? > > > > There is no such thing as 'a table' in plain PDF. > > It's just lines and words painted on a canvas, > > possible in an arbitrary order. > > > > Unless your tables cells are form fields, or your > > PDF contains specific table structures (Tagged PDF), > > iText probably won't help you. > > > > br, > > Bruno > > > > -- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----- > > This SF.net email is sponsored by: Splunk Inc. > > Still grepping through log files to find problems? Stop. > > Now Search log events and configuration files using AJAX and a browser. > > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > __ ____ ____ ____ ____ ____ ____ ____ ____ ____ > > iText-questions mailing list > > iText-questions@(protected) > > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://itext.ugent.be/itext-in-action/ > > > > > > -- > View this message in context: > http://www.nabble.com/u-guys-konw-how-to-read-the-data-from-pdf-using-java -itext > ---tf4572506.html#a13067937 > Sent from the iText - General mailing list archive at Nabble.com. > > > -- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > __ ____ ____ ____ ____ ____ ____ ____ ____ ____ > iText-questions mailing list > iText-questions@(protected) > https://lists.sourceforge.net/lists/listinfo/itext-questions > Buy the iText book: http://itext.ugent.be/itext-in-action/ <html> <!-- BEGIN WEBMAIL STATIONERY --> <head></head> <body> <!-- WEBMAIL STATIONERY noneset --> <DIV></DIV> <P>Yes; but it is not practicable with iText. You could, however, as long as the PDF is printable, use the following procedure:</P> <P> 1. Print to a PS file.</P> <P> 2. Scan the PS file from step1 above, dropping all lines that do not end with Tj or TJ.</P> <P> 3. Use a regular expression (together with Substitution or Match) to extract the instances of "text fragment" from within multiple instances of "(text fragment)Tj", printing the resulting text fragments to STDOUT.</P> <P>Bruno has given an excellent example of why you should not expect the resulting output to make sense, i.e., the text fragments may not appear in the order in which you'd like for them to appear.</P> <P>Cheers,</P> <P>Bill Segraves<BR></P> <BLOCKQUOTE style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #1010ff 2px solid">-- ---- ------ Original message from krammark <wenwen_829@(protected) .com>: -- ---- ------ <BR><BR><BR>> <BR>> so , how we read the data from pdf ? <BR>> i mean , can we read them line by line from the specific pages ? <BR>> <BR>> thanks buddy. <BR>> <BR>> <BR>> Bruno Lowagie (iText) wrote: <BR>> > <BR>> > krammark wrote: <BR>> > ;> hey gusy, <BR>> >> do u guys have a idea how to read the data from pdf pages using itext ? <BR>> >> basically, i want to read the data from table and write them into excel <BR>> >> files. <BR>> > ;> is that possible ? <BR>> > <BR>> > There is no such thing as 'a table' in plain PDF. <BR>> > It's just lines and words painted on a canvas, <BR>> > possible in an arbitrary order. <BR>> > <BR>> > Unless your tables cells are form fields, or your <BR>> ; > PDF contains specific table structures (Tagged PDF), <BR>> > iText probably won't help you. <BR>> > <BR>> > br, <BR>> > Bruno <BR>> > <BR>> > -- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- -- ---- ---- ---- --- <BR>> > This SF.net email is sponsored by: Splunk Inc. <BR>> > Still grepping through log files to find problems? Stop. <BR >> > Now Search log events and configuration files using AJAX and a browser. <BR>> > Download your FREE copy of Splunk now >> http:/ /get.splunk.com/ <BR>> > __ ____ ____ ____ ____ ____ ____ ____ ____ ____ <BR>> > iText-questions mailing list <BR>> > iText-questions@(protected) .sourceforge.net <BR>> > https://lists.sourceforge.net/lists/listinfo /itext-questions <BR>> > Buy the iText book: http://itext.ugent.be/itext -in-action/ <BR>> > <BR>> > <BR>> <BR>> -- <BR>> View this message in context: <BR>> http://www.nabble.com/u-guys-konw -how-t o-read-the-data-from-pdf-using-java-itext <BR>> ---tf4572506.html#a13067937 <BR>> Sent from the iText - General mailing list archive at Nabble.com. <BR> > <BR>> <BR>> -- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- -- ---- ---- --- <BR>> This SF.net email is sponsored by: Splunk Inc. <BR>> ; Still grepping through log files to find problems? Stop. <BR>> Now Search log events and configuration files using AJAX and a browser. <BR>> Download your FREE copy of Splunk now >> http://get.splunk.com/ <BR>> __ ______ __ ____ ____ ____ ____ ____ ____ _____ <BR>> iText-questions mailing list <BR >> iText-questions@(protected) <BR>> https://lists.sourceforge .net/lists/listinfo/itext-questions <BR>> Buy the iText book: http://itext .ugent.be/itext-in-action/ </BLOCKQUOTE> <!-- END WEBMAIL STATIONERY -->
</body> </html>
-- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ __ ____ ____ ____ ____ ____ ____ ____ ____ ____ iText-questions mailing list iText-questions@(protected) https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://itext.ugent.be/itext-in-action/
|
|
 |