Java Mailing List Archive

http://www.junlu.com/

Google
Google
Mailing List
Home
Forum Home
JBoss - Java Application Server
Tomcat - JSP/Servlet container
Struts - A MVC web framework
iText - An open source PDF Java Library
JDOM - JDOM XML Parser
JSP - A mailing list about Java Server Pages specification and reference
J2EE - A mailing list for Java(tm) 2 Platform, Enterprise Edition
J2EE Pattern - An interest list for Sun Java Center J2EE Pattern Catalog
Servlet - A mailing list for discussion about Sun Microsystem's Java Servlet API Technology
Struts & Hibernate
Subjects
JSP editor plugin for eclipse ?
org apache jasper JasperException: Unable to compile class for JSP
Tomcat: Connection reset by peer: socket write error
Cannot retrieve definition for form bean null
Struts Tiles Tutorial (free Struts training)
Where do I download Tomcat 4 0 6?
Data Access Object (DAO) pattern, example DAO 's
Where to download Tomcat v 4 1 24 from?
Tomcat 5 0 16 Requested resource not available
Servlet : Session invalidate
Oracle Connection Pooling in 3 2 2
Servlet action is currently unavailable
Tomcat/Struts Unicode Encoding/Decoding problems
Running a Simple JMS Example
Tomcat and webapplication specific java library path
Mapping in workers2 properties
org apache jasper JasperException
problem with html:text bean throwing exception
Cannot find message resources under key org apache struts action
   MESSAGE
Cannot find message resources under key org apache struts action MESSAGE
invalid direct reference problem with solution
Tool for jsp debug Try Sysdeo Eclipse Plugin
Tomcat 5 Cannot load JDBC driver class 'null ' SQL state: null
weblogic ejbc
java properties file
Jboss 3 2 3 Coyote Can 't re
Tomcat 5, Apache2 and mod jk2 integration problem
JBoss example problem new to J2EE
Value attribute of <html:checkbox
url string for connecting jboss to oracle
javax servlet ServletException: BeanUtils populate
5 0 18: Windows XP Pro vs Windows 2000
HTTP Status 404 The requested resource is not available
 
-none-

-none-

2007-10-07       - By Leonard Rosenthol

 Back
Why can't you just purchase/license a 3rd party tool that can do this  
for you?  There are a variety of them available for various OS  
platforms.

Leonard

On Oct 4, 2007, at 8:32 PM, Ted Chen wrote:

> I ended up using pdfcopy by adding imported pages, each page  
> twice.  Then define cropbox on each page accordingly.  The result  
> pdf is almost as large as the original one, even though the # of  
> pages are doubled.  This is better than my original solution, but  
> still has unnecessary references for the cropped out slides.
> It's not as ideal as Mark's proposal, but works for now.  :)
>
> Thanks again.
> Ted
>
> -- -- Original Message ----
> From: Ted Chen <nehcdet@(protected)>
> To: Post all your questions about iText here <itext-
> questions@(protected)>
> Sent: Thursday, October 4, 2007 3:51:31 PM
> Subject: Re: [iText-questions] remove content from a page
>
> Thanks a bunch, Mark, for the detailed analysis.  It's very clear  
> and helpful for a pdf novice like me.
> To make things worse, I not only need to parse this pdf,  but would  
> need to run the same code against many docs.  The good news is that  
> these docs are all produced by the same pdf printer.  I guess in  
> order to reach my goal, I'll need to analyze all the docs and  
> figure out the common pattern.  would be fun.  :)
>
> Although it might be a bit too fun for the current short time  
> period I have, I'll definitely get back to it, as I don't like my  
> current solution of cropping out the invisibles.
>
> I appreciate your help.
>
> Ted
> -- -- Original Message ----
> From: Mark Storer <mstorer3772@(protected)>
> To: Post all your questions about iText here <itext-
> questions@(protected)>
> Sent: Thursday, October 4, 2007 2:15:21 PM
> Subject: Re: [iText-questions] remove content from a page
>
> No bueno.
>
> Each of those slides is made up of a collection of resources &  
> drawing commands.  To determine where one ends and the other starts  
> (and which one is which), you'd need to parse the page's content  
> stream.
>
> !trivial.
>
> iText includes a basic content tokenizer, but beyond that you're in  
> for a medium-large amount of research and work to pull this off.
>
> I'll talk you through this particular page's contents to let you  
> know what you're in for:
>
> q // push the current graphic state onto the state stack
> 0.1 0 0 0.1 0 0 cm // Concatenate Matrix to 1/10th scale (from the  
> default 1 0 0 1 0 0)
> /R7 gs  // use an extended graphic state
> q // push
> 0.101562 0 6119.8 7920 re W n  // clip to the given REctangle
> 1 1 1 rg // set the rgb foreground color to white
> 0.101562 0 6119.8 7920 re // rect
> f // fill
> 0.199951 0.199951 0.199951 rg // new fill color
> 387.102 4136.9 4201.8 8.20312 re // rect
> f // fill
> 387.102 4136.9 7.79688 3155.2 re //yada yada
> f // blah blah
> 387.102 7283.9 4201.8 8.20312 re
> f
> 4581.1 4136.9 7.79688 3155.2 re
> f
> Q // pop the graphic state
> q // push
> 395.102 7284.1 m //move to
> 395.102 4144.9 l // line to
> 4580.9 4144.9 l //line
> 4580.9 7284.1 l //line
> h // close the path
> W n // create a new clipping region
> 1 1 1 rg // white fill
> 395.102 4144.9 4185.8 3139.2 re
> f
> q // push
>  4186 0 0 3139.5 394.5 4144.5 cm // big numbers due to prior 1/10th  
> matrix
> /R8 Do // draw the R8 resource
> Q //pop
> Q //pop
> q //push
> 419.102 7260.1 m // move, line line line, close, clip
> 419.102 4290.9 l
> 2397.9 4290.9 l
> 2397.9 7260.1 l
> h
> W n
> q//push
> 1979.5 0 0 2969 418.5 4291 cm // we'd popped back to the 1/10th again
> /R9 Do // draw R9
> Q // pop
> Q //pop
> q //push
>
> // lots more of the same till we get down to some text operators
> // I'll leave in the intervening content to let you know what
> // you're up against.
> 2333.1 7147.1 m
> 2333.1 5712.9 l
> 3927.9 5712.9 l
> 3927.9 7147.1 l
> h
> W n
> q 1595.5 0 0 1434.5 2332.5 5712.5 cm
> /R10 Do
> Q
> Q
> q
> 1862.1 5769.1 m
> 1862.1 4303.9 l
> 3858.9 4303.9 l
> 3858.9 5769.1 l
> h
> W n
> q 1997 0 0 1465 1861.5 4304 cm
> /R11 Do
> Q
> Q
> q
> 0.101562 0 6119.8 7920 re W n
> 0.199951 0.199951 0.199951 rg
> 387.102 336.898 4201.8 8.20312 re
> f
> 387.102 336.898 7.79688 3156.2 re
> f
> 387.102 3484.9 4201.8 8.20312 re
> f
> 4581.1 336.898 7.79688 3156.2 re
> f
> Q
> q
> 395.102 3485.1 m
> 395.102 344.898 l
> 4580.9 344.898 l
> 4580.9 3485.1 l
> h
> W n
> 1 1 1 rg
> 395.102 344.898 4185.8 3140.2 re
> f
> q 4186 0 0 3139.5 394.5 345 cm
> /R12 Do
> Q
> Q
> q
> 419.102 3461.1 m
> 419.102 1101.9 l
> 1678.9 1101.9 l
> 1678.9 3461.1 l
> h
> W n
> q 1260 0 0 2358.5 418.5 1102 cm
> /R13 Do
> Q
> Q
> q
> 395.102 3485.1 m
> 395.102 344.898 l
> 4580.9 344.898 l
> 4580.9 3485.1 l
> h
> W n
> 1 1 1 RG // sets the foreground color to white
> 1 1 1 rg
>
> // ********  And here comes the text ********
> q
> 10 0 0 10 0 0 cm // 1/10th * 10.  An odd way to go about things
> BT // begin a text object
> /R14 17 Tf  // set the current font
> 1 0 0 1 242.9 249.85 Tm // set the Text Matrix
> // my content editor turns odd binary values into {hex hex}
> // so bear with me:
> ({01}{02}{03}{04}{03}{04}{05}{03}{06}{07}{08}    \n{0b}{0c})Tj //  
> draw some characters
> -41.45 -20.8 Td //
> (\r{03}{0b}{0e}{07}{0f}{03}{0b}{07}\n{08}{07}\n{0b}{07}{08}    {03}
> {07}{10}\n{08}{08}{10}{03}{07}    {11}{08}{03}{10}{07}\n{12})Tj
> 84.4 -20.8 Td // move the start of the next line
> ({13}{14}{05}{15}\n{16})Tj // draw more text
> ET // end text object
>
> // and then we're back to more of the same
> Q
> Q
> q
> 419.102 344.898 4161.8 779.203 re W n
> q 4186 0 0 802 418.5 321.5 cm
> /R16 Do
> Q
> Q
> Q
>
> -- ---- ---- ---- ---- ----
>
> So given all that STUFF, you need to parse it, interpret it, and  
> figure out which resource is used where.  You can then determine  
> the resource names of things used outside the page's media box (or  
> whichever page pox you're altering), and remove them from the  
> page's resource dictionary... then do the whole removeUnusedObjects
> () thing.
>
> Have fun.  :\
>
>
> --
> --Mark Storer
> Professional Geek
>
>
> Catch up on fall's hot new shows on Yahoo! TV. Watch previews, get  
> listings, and more!
>
>
> Shape Yahoo! in your own image. Join our Network Research Panel today!
> -- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- --
> ---
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a  
> browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> __ ____ ____ ____ ____ ____ ____ ____ ____ ____
> iText-questions mailing list
> iText-questions@(protected)
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> Buy the iText book: http://itext.ugent.be/itext-in-action/


<HTML><BODY style="word-wrap: break-word; -khtml-nbsp-mode: space; -khtml-line
-break: after-white-space; ">Why can't you just purchase/license a 3rd party
tool that can do this for you?� There are a variety of them available for
various OS platforms.<DIV><BR class="khtml-block-placeholder"></DIV><DIV
>Leonard</DIV><DIV><BR class="khtml-block-placeholder"><DIV><DIV><DIV>On Oct 4,
2007, at 8:32 PM, Ted Chen wrote:</DIV><BR class="Apple-interchange-newline">
<BLOCKQUOTE type="cite"><DIV style="font-family:times new roman, new york, times
, serif;font-size:12pt"><DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new
roman, new york, times, serif">I ended up using pdfcopy by adding imported
pages, each page twice.� Then define cropbox on each page accordingly.� The
result pdf is almost as large as the original one, even though the # of pages
are doubled.� This is better than my original solution, but still has
unnecessary references for the cropped out slides.� </DIV> <DIV style="FONT
-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">It's not as
ideal as Mark's proposal, but works for now.� :)� </DIV> <DIV style="FONT-SIZE:
12pt; FONT-FAMILY: times new roman, new york, times, serif">�</DIV> <DIV style=
"FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">Thanks
again.</DIV> <DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new
york, times, serif">Ted<BR><BR></DIV> <DIV style="FONT-SIZE: 12pt; FONT-FAMILY:
times new roman, new york, times, serif">-- -- Original Message ----<BR>From:
Ted Chen &lt;<A href="mailto:nehcdet@(protected)">nehcdet@(protected)</A>&gt;<BR>To
: Post all your questions about iText here &lt;<A href="mailto:itext-questions
@(protected)">itext-questions@(protected)</A>&gt;<BR>Sent:
Thursday, October 4, 2007 3:51:31 PM<BR>Subject: Re: [iText-questions] remove
content from a page<BR><BR> <DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new
roman, new york, times, serif"> <DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times
new roman, new york, times, serif">Thanks a bunch, Mark, for the detailed
analysis.� It's very clear and helpful for a pdf novice like me.</DIV> <DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"
>To make things worse, I not only need to parse this pdf,��but would need to run
the same code against many docs.��The good news is�that these docs are all
produced by the same pdf printer.� I guess in order to reach my goal, I'll need
to analyze all the docs and figure out the common pattern.� would be fun.� :)�
</DIV> <DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york,
times, serif">�</DIV> <DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman
, new york, times, serif">Although it might be a bit too fun for the current
short time period I have, I'll definitely get back to it, as I don't like my
current solution of cropping out the invisibles.� </DIV> <DIV style="FONT-SIZE:
12pt; FONT-FAMILY: times new roman, new york, times, serif">�</DIV> <DIV style=
"FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">I
appreciate your help.� </DIV> <DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times
new roman, new york, times, serif">�</DIV> <DIV style="FONT-SIZE: 12pt; FONT
-FAMILY: times new roman, new york, times, serif">Ted<BR></DIV> <DIV style="FONT
-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">-- --
Original Message ----<BR>From: Mark Storer &lt;<A href="mailto:mstorer3772
@(protected)">mstorer3772@(protected)</A>&gt;<BR>To: Post all your questions about
iText here &lt;<A href="mailto:itext-questions@(protected)">itext
-questions@(protected)</A>&gt;<BR>Sent: Thursday, October 4, 2007 2:15
:21 PM<BR>Subject: Re: [iText-questions] remove content from a page<BR><BR>No
bueno.<BR><BR>Each of those slides is made up of a collection of resources &amp
; drawing commands.� To determine where one ends and the other starts (and which
one is which), you'd need to parse the page's content stream. <BR><BR>!trivial.
<BR><BR>iText includes a basic content tokenizer, but beyond that you're in for
a medium-large amount of research and work to pull this off.<BR><BR>I'll talk
you through this particular page's contents to let you know what you're in for:
<BR><BR>q // push the current graphic state onto the state stack<BR>0.1 0 0 0.1
0 0 cm // Concatenate Matrix to 1/10th scale (from the default 1 0 0 1 0 0)<BR>
/R7 gs� // use an extended graphic state<BR>q // push<BR>0.101562 0 6119.8 7920
re W n� // clip to the given REctangle<BR>1 1 1 rg // set the rgb foreground
color to white<BR>0.101562 0 6119.8 7920 re // rect<BR>f // fill<BR>0.199951 0
.199951 0.199951 rg // new fill color<BR>387.102 4136.9 4201.8 8.20312 re //
rect<BR>f // fill<BR>387.102 4136.9 7.79688 3155.2 re //yada yada<BR>f // blah
blah<BR>387.102 7283.9 4201.8 8.20312 re<BR>f<BR>4581.1 4136.9 7.79688 3155.2
re<BR>f<BR>Q // pop the graphic state<BR>q // push <BR>395.102 7284.1 m //move
to<BR>395.102 4144.9 l // line to<BR>4580.9 4144.9 l //line<BR>4580.9 7284.1 l
//line<BR>h // close the path<BR>W n // create a new clipping region<BR>1 1 1 rg
// white fill<BR>395.102 4144.9 4185.8 3139.2 re<BR>f<BR>q // push<BR>�4186 0 0
3139.5 394.5 4144.5 cm // big numbers due to prior 1/10th matrix<BR>/R8 Do //
draw the R8 resource<BR>Q //pop<BR>Q //pop<BR>q //push<BR>419.102 7260.1 m //
move, line line line, close, clip <BR>419.102 4290.9 l<BR>2397.9 4290.9 l<BR
>2397.9 7260.1 l<BR>h<BR>W n<BR>q//push<BR>1979.5 0 0 2969 418.5 4291 cm // we'd
popped back to the 1/10th again<BR>/R9 Do // draw R9<BR>Q // pop<BR>Q //pop<BR
>q //push<BR><BR>// lots more of the same till we get down to some text
operators<BR>// I'll leave in the intervening content to let you know what<BR>/
/ you're up against.<BR>2333.1 7147.1 m<BR>2333.1 5712.9 l<BR>3927.9 5712.9 l<BR
>3927.9 7147.1 l<BR>h<BR>W n<BR>q 1595.5 0 0 1434.5 2332.5 5712.5 cm<BR>/R10 Do
<BR>Q<BR>Q<BR>q<BR>1862.1 5769.1 m<BR>1862.1 4303.9 l<BR>3858.9 4303.9 l<BR>3858
.9 5769.1 l<BR>h<BR>W n<BR>q 1997 0 0 1465 1861.5 4304 cm<BR>/R11 Do <BR>Q<BR>Q
<BR>q<BR>0.101562 0 6119.8 7920 re W n<BR>0.199951 0.199951 0.199951 rg<BR>387
.102 336.898 4201.8 8.20312 re<BR>f<BR>387.102 336.898 7.79688 3156.2 re<BR>f<BR
>387.102 3484.9 4201.8 8.20312 re<BR>f<BR>4581.1 336.898 7.79688 3156.2 re<BR>f
<BR>Q<BR>q<BR>395.102 3485.1 m<BR>395.102 344.898 l<BR>4580.9 344.898 l<BR>4580
.9 3485.1 l<BR>h<BR>W n<BR>1 1 1 rg<BR>395.102 344.898 4185.8 3140.2 re<BR>f<BR
>q 4186 0 0 3139.5 394.5 345 cm<BR>/R12 Do <BR>Q<BR>Q<BR>q<BR>419.102 3461.1 m
<BR>419.102 1101.9 l<BR>1678.9 1101.9 l<BR>1678.9 3461.1 l<BR>h<BR>W n<BR>q 1260
0 0 2358.5 418.5 1102 cm<BR>/R13 Do<BR>Q<BR>Q<BR>q<BR>395.102 3485.1 m<BR>395
.102 344.898 l<BR>4580.9 344.898 l<BR>4580.9 3485.1 l<BR>h<BR>W n<BR>1 1 1 RG //
sets the foreground color to white<BR>1 1 1 rg<BR><BR>// ********� And here
comes the text ********<BR>q<BR>10 0 0 10 0 0 cm // 1/10th * 10.� An odd way to
go about things<BR>BT // begin a text object<BR>/R14 17 Tf� // set the current
font<BR>1 0 0 1 242.9 249.85 Tm // set the Text Matrix<BR>// my content editor
turns odd binary values into {hex hex}<BR>// so bear with me:<BR>({01}{02}{03}
{04}{03}{04}{05}{03}{06}{07}{08}��� \n{0b}{0c})Tj // draw some characters <BR>
-41.45 -20.8 Td // <BR>(\r{03}{0b}{0e}{07}{0f}{03}{0b}{07}\n{08}{07}\n{0b}{07}
{08}��� {03}{07}{10}\n{08}{08}{10}{03}{07}��� {11}{08}{03}{10}{07}\n{12})Tj<BR
>84.4 -20.8 Td // move the start of the next line<BR>({13}{14}{05}{15}\n{16})Tj
// draw more text <BR>ET // end text object<BR><BR>// and then we're back to
more of the same<BR>Q<BR>Q<BR>q<BR>419.102 344.898 4161.8 779.203 re W n<BR>q
4186 0 0 802 418.5 321.5 cm<BR>/R16 Do<BR>Q<BR>Q<BR>Q<BR><BR>-- ---- ---- -----
-- ------ <BR><BR>So given all that STUFF, you need to parse it, interpret it,
and figure out which resource is used where.� You can then determine the
resource names of things used outside the page's media box (or whichever page
pox you're altering), and remove them from the page's resource dictionary...
then do the whole removeUnusedObjects() thing. <BR><BR>Have fun.� :\<BR><BR><BR
>-- <BR>--Mark Storer<BR>Professional Geek</DIV> <DIV style="FONT-SIZE: 12pt;
FONT-FAMILY: times new roman, new york, times, serif"><BR></DIV></DIV><BR> <HR
size="1"> Catch up on <A href="http://us.rd.yahoo.com/tv/mail/tagline/falltv
/evt=47093/*http://tv.yahoo.com/collections/3658" target="_blank" rel="nofollow"
>fall's hot new shows</A> on Yahoo! TV. Watch previews, get listings, and more!
</DIV> <DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york,
times, serif"><BR></DIV></DIV><BR>      <HR size="1">Shape Yahoo! in your own
image.  <A href="http://us.rd.yahoo.com/evt=48517/*http://surveylink.yahoo.com
/gmrs/yahoo_panel_invite.asp?a=7">Join our Network Research Panel today!</A><DIV
style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px
; ">-- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- -----<
/DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin
-left: 0px; ">This SF.net email is sponsored by: Splunk Inc.</DIV><DIV style=
"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "
>Still grepping through log files to find problems?<SPAN class="Apple-converted
-space">� </SPAN>Stop.</DIV><DIV style="margin-top: 0px; margin-right: 0px;
margin-bottom: 0px; margin-left: 0px; ">Now Search log events and configuration
files using AJAX and a browser.</DIV><DIV style="margin-top: 0px; margin-right:
0px; margin-bottom: 0px; margin-left: 0px; ">Download your FREE copy of Splunk
now &gt;&gt; <A href="http://get.splunk.com/__ ____ ____ ____ ____ ____ ____ __
__ ____ ____">http://get.splunk.com/__ ____ ____ ____ ____ ____ ____ ____ ______
___</A></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px;
margin-left: 0px; ">iText-questions mailing list</DIV><DIV style="margin-top:
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><A href="mailto
:iText-questions@(protected)">iText-questions@(protected)</A
></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px;
margin-left: 0px; "><A href="https://lists.sourceforge.net/lists/listinfo/itext
-questions">https://lists.sourceforge.net/lists/listinfo/itext-questions</A><
/DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin
-left: 0px; ">Buy the iText book: <A href="http://itext.ugent.be/itext-in-action
/">http://itext.ugent.be/itext-in-action/</A></DIV> </BLOCKQUOTE></DIV><BR></DIV
></DIV></BODY></HTML>
-- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- -----
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
__ ____ ____ ____ ____ ____ ____ ____ ____ ____
iText-questions mailing list
iText-questions@(protected)
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/

©2008 junlu.com - Jax Systems, LLC, U.S.A.