Java Mailing List Archive

http://www.junlu.com/

Home » Home (12/2007) » iText »

[iText-questions] HTMLworker and Word strange behaviour

Paul Jones

2007-07-11


Thanks. The descriptions are stored as separate Word documents. (I don't
know why?!)

So I have saved these as HTML and when I use HTMLworker, the <h1> text is
printed twice on the PDF output. It appears in a large font and again in a
small font. Is this a bug?



**** Example code:

 public static void main(String[] args) {

   Document document = new Document();
   try {
     StyleSheet styles = new StyleSheet();
     PdfWriter.getInstance(document, new
FileOutputStream("trial.pdf"));
     document.open();
     ArrayList objects;
     styles.loadTagStyle("p", "leading", "12,0");
     objects = HTMLWorker.parseToList(new FileReader(
     "C001.html"), styles);
 for (int k = 0; k < objects.size(); ++k)
   document.add((Element) objects.get(k));
     
   } catch (Exception e) {
     e.printStackTrace();
     System.err.println(e.getMessage());
   }
   document.close();
 }



**** Example C001.html from Word below:

<html xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns="http://www.w3.org/TR/REC-html40">

<head>
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<meta name=ProgId content=Word.Document>
<meta name=Generator content="Microsoft Word 9">
<meta name=Originator content="Microsoft Word 9">
<link rel=File-List href="./C001_files/filelist.xml">
<title>C001 Inspection Warranty

   </title>
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Author>Barbara tabis</o:Author>
<o:LastAuthor>Paul Jones</o:LastAuthor>
<o:Revision>2</o:Revision>
<o:TotalTime>1</o:TotalTime>
<o:Created>2007-07-11T11:33:00Z</o:Created>
<o:LastSaved>2007-07-11T11:33:00Z</o:LastSaved>
<o:Pages>1</o:Pages>
<o:Words>210</o:Words>
<o:Characters>1201</o:Characters>
<o:Company>Fortis Insurance</o:Company>
<o:Lines>10</o:Lines>
<o:Paragraphs>2</o:Paragraphs>
<o:CharactersWithSpaces>1474</o:CharactersWithSpaces>
<o:Version>9.6926</o:Version>
</o:DocumentProperties>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>

<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:UseMarginsForDrawingGridOrigin/>
<w:Compatibility>
 <w:FootnoteLayoutLikeWW8/>
 <w:ShapeLayoutLikeWW8/>
 <w:AlignTablesRowByRow/>
 <w:ForgetLastTabAlignment/>
 <w:LayoutRawTableWidth/>
 <w:LayoutTableRowsApart/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]-->
<style>
<!--
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
 {mso-style-parent:"";
 margin:0cm;
 margin-bottom:.0001pt;
 mso-pagination:widow-orphan;
 font-size:10.0pt;
 font-family:"Times New Roman";
 mso-fareast-font-family:"Times New Roman";}
h1
 {mso-style-next:Normal;
 margin:0cm;
 margin-bottom:.0001pt;
 mso-pagination:widow-orphan;
 page-break-after:avoid;
 mso-outline-level:1;
 font-size:10.0pt;
 font-family:"Times New Roman";
 mso-font-kerning:0pt;}
@page Section1
 {size:595.45pt 841.7pt;
 margin:72.0pt 90.0pt 72.0pt 90.0pt;
 mso-header-margin:36.0pt;
 mso-footer-margin:36.0pt;
 mso-paper-source:0;}
div.Section1
 {page:Section1;}
-->
</style>
</head>

<body lang=EN-GB style='tab-interval:36.0pt'>
<div class=Section1>
<h1><span style='font-family:Arial'>C001 Inspection Warranty<span
style="mso-spacerun:
yes">
</span><o:p></o:p></span></h1>
<p class=MsoNormal><span style='font-family:Arial'><span
style="mso-spacerun:
yes">

</span><o:p></o:p></span></p>
<p class=MsoNormal><span style='font-family:Arial'><span
style="mso-spacerun:
yes"> </span>It is warranted that the <b>Insured </b>or the <b>Insured's
Employees</b> must carry<span style="mso-spacerun: yes">
</span><o:p></o:p></span></p>
<p class=MsoNormal><span style='font-family:Arial'><span
style="mso-spacerun:
yes"> </span>out an examination of the <b>Premises</b> for smouldering
matches
tobacco and<span style="mso-spacerun: yes">
</span><o:p></o:p></span></p>
<p class=MsoNormal><span style='font-family:Arial'><span
style="mso-spacerun:
yes"> </span>other materials at the close of each day's business and for a
signed<span style="mso-spacerun: yes">
</span><o:p></o:p></span></p>
<p class=MsoNormal><span style='font-family:Arial'><span
style="mso-spacerun:
yes"> </span>log to be made daily by the <b>Employee</b> undertaking the
examination which<span style="mso-spacerun: yes">
</span><o:p></o:p></span></p>
<p class=MsoNormal><span style='font-family:Arial'><span
style="mso-spacerun:
yes"> </span>is checked weekly by the <b>Insured's</b> management.<span
style="mso-spacerun: yes">
</span><o:p></o:p></span></p>
<p class=MsoNormal><span style='font-family:Arial'><span
style="mso-spacerun:
yes">

</span><o:p></o:p></span></p>
<p class=MsoNormal><span style='font-family:Arial'><span
style="mso-spacerun:
yes">

</span><o:p></o:p></span></p>
<p class=MsoNormal><span style='font-family:Arial'><span
style="mso-spacerun:
yes">                           </span><span
style="mso-spacerun:
yes">                           </span><o:p></o:p></span></p>
<p class=MsoNormal><span style="mso-spacerun:
yes">

</span></p>
<p class=MsoNormal><span style="mso-spacerun:
yes">

</span></p>
<p class=MsoNormal><span style="mso-spacerun:
yes">                                   </span><span
style="mso-spacerun: yes">                  </span></p>
<p class=MsoNormal><span style="mso-spacerun:
yes">

</span></p>
<p class=MsoNormal><span style="mso-spacerun:
yes">

</span></p>
<p class=MsoNormal><span style="mso-spacerun:
yes">
</span><span
style="mso-spacerun: yes">         </span></p>
<p class=MsoNormal><span style="mso-spacerun:
yes">

</span></p>
<p class=MsoNormal><span style="mso-spacerun:
yes">

</span></p>
</div>
</body>
</html>



-----Original Message-----
Date: Tue, 10 Jul 2007 17:30:56 +0200
From: Paul Jones <Paul.Jones@(protected)>
Subject: [iText-questions] switching bold on and off
To: "'itext-questions@(protected)'"
 <itext-questions@(protected)>
Message-ID:
 
<EA25B58C4871D911BF760008C791B51F045C2D07@(protected)>
 
Content-Type: text/plain

Hi. I need to retrieve some text descriptions for some code values. When I
add this text to the PDF, it has to retain the correct format. So some words
need to be bold, some italic, etc.

How would you do this? How would you store the text in the look-up table?
How would you parse the text and switch bold on/off?

I assume the normal way to do this would be to have several chunks or
phrases with different fonts and then add them together to make a paragraph.
But I want to store the text outside of the code so that other programs can
access it.


------------------------------

Date: Tue, 10 Jul 2007 18:23:13 +0200
From: Bruno Lowagie <bruno@(protected)>
Subject: Re: [iText-questions] switching bold on and off
To: Post all your questions about iText here
 <itext-questions@(protected)>
Message-ID: <4693B271.209@(protected)>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Paul Jones wrote:
> I assume the normal way to do this would be to have several chunks or
> phrases with different fonts and then add them together to make a
paragraph.
> But I want to store the text outside of the code so that other programs
can
> access it.

I may have overlooked this, but how do you store the text now?
I would use very simple HTML snippets, and parse it into iText
objects with HTMLWorker.
br,
Bruno




-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
iText-questions mailing list
iText-questions@(protected)
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/
©2008 junlu.com - Jax Systems, LLC, U.S.A.