Eric
Try using nekohtml (http://www.apache.org/~andyc/neko/doc/html/) by Andy
Clark. I use this instead of JTidy now as I found I had to perform too many
hacks to get JTidy to parse pages with custom tags etc
Paul
>From: eric@(protected)
>To: jdom-interest@(protected)
>Subject: [jdom-interest] Trying to use jdom with TagSoup
>Date: Tue, 30 Dec 2003 12:02:32 -0800 (PST)
>
>I also posted this to comp.lang.java.programmer and at
>www.javaworld.com, but I thought you guys might know
>best on this one:
>
>I'm trying to convert html pages to xml and I'm having
>some difficulty
>with the folowing:
>
>1. I try to use Tidy but the html that I'm trying to
>convert to xhtml
>has too many errors and so I spend a lot of time trying
>to "fix" the
>html before running it through Tidy. I'm using Tidy
>with -asxml
>
>2. I've tried using TagSoup with JDOM but the
>SAXBuilder internally
>tries to set the namespace prefixes and TagSoup does
>not support that
>internal feature.
>
>I really would appreciate help from someone who has
>delt with having
>to crank out lots of xml(xhtml) from poorly formatted
>html. I appreciate
>any help! ;)
>
>-Eric
>_______________________________________________
>To control your jdom-interest membership:
>http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@(protected)
_________________________________________________________________
Express yourself with cool new emoticons http://www.msn.co.uk/specials/myemo
_______________________________________________
To control your jdom-interest membership:
http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@(protected)