  | Mailing List | | Home | | Forum Home | | JBoss - Java Application Server | | Struts - A MVC web framework | | Tomcat - JSP/Servlet container | | iText - An open source PDF Java Library | | JDOM - JDOM XML Parser | | J2EE - A mailing list for Java(tm) 2 Platform, Enterprise Edition | | J2EE Pattern - An interest list for Sun Java Center J2EE Pattern Catalog | | Servlet - A mailing list for discussion about Sun Microsystem's Java Servlet API Technology | | JSP - A mailing list about Java Server Pages specification and reference | |
Struts & Hibernate
|
|
|
  | | | Verbose XHTML 1.1 Doctype | Verbose XHTML 1.1 Doctype 2004-03-24 - By David Dorward
Back I have a number of XHTML 1.1 documents, all conforming to the same template, which I want to extract some data from and then insert that data into different XHTML 1.1 documents.
As a first step I am trying to read in a document and then print it out again without any modification. I've run into two issues:
1. It appears to be downloading the DTD from the w3c website - this takes time and bandwidth.
2. It seems to be expanding the Doctype line (example below).
Is there any way to stop this? I'd like to leave the Doctype alone and save time on reading the DTD (I don't care about validation - that is handled elsewhere). I couldn't find anything looking at the docs, but I suspect this is due to not knowing what to look for.
My code:
import org.jdom.*; import org.jdom.JDOMException ; import org.jdom.input.SAXBuilder ; import org.jdom.output.XMLOutputter ; import java.io.IOException ;
public class Parse {
public static void main (String [] args) {
SAXBuilder builder = new SAXBuilder(); Document doc; XMLOutputter outputter = new XMLOutputter();
try { doc = builder.build("/path/to/about.xhtml"); System.out.println(" is well formed."); try { outputter.output(doc, System.out); } catch (IOException e) { System.err.println(e); } } catch (JDOMException e) { // indicates a well-formedness or other error System.out.println(" is not well formed: " + e.getMessage()); } catch (IOException e) { System.out.println("Could not check "); System.out.println(" because " + e.getMessage()); } } }
Examples: For input of:
<?xml version="1.0" encoding="ISO-8859 (See http://ISO-8859.ora-code.com)-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.or g/1999/xhtml" xml:lang="en"> <head> <title>About</title> etc
It outputs:
<?xml version="1.0" encoding="UTF-8 (See http://UTF-8.ora-code.com)"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" [ <!NOTATION w3c-xml PUBLIC "ISO 8879//NOTATION Extensible Markup Language (XML) 1.0//EN"> <!NOTATION cdata PUBLIC "-//W3C//NOTATION XML 1.0: CDATA//EN"> <!NOTATION fpi PUBLIC "ISO 8879:1986//NOTATION Formal Public Identifier//EN"> <!NOTATION length PUBLIC "-//W3C//NOTATION XHTML Datatype: Length//EN"> <!NOTATION linkTypes PUBLIC "-//W3C//NOTATION XHTML Datatype: LinkTypes//EN"> <!NOTATION mediaDesc PUBLIC "-//W3C//NOTATION XHTML Datatype: MediaDesc//EN"> <!NOTATION multiLength PUBLIC "-//W3C//NOTATION XHTML Datatype: MultiLength//EN"> <!NOTATION number PUBLIC "-//W3C//NOTATION XHTML Datatype: Number//EN"> <!NOTATION pixels PUBLIC "-//W3C//NOTATION XHTML Datatype: Pixels//EN"> <!NOTATION script PUBLIC "-//W3C//NOTATION XHTML Datatype: Script//EN"> <!NOTATION text PUBLIC "-//W3C//NOTATION XHTML Datatype: Text//EN"> <!NOTATION character PUBLIC "-//W3C//NOTATION XHTML Datatype: Character//EN"> <!NOTATION charset PUBLIC "-//W3C//NOTATION XHTML Datatype: Charset//EN"> <!NOTATION charsets PUBLIC "-//W3C//NOTATION XHTML Datatype: Charsets//EN"> <!NOTATION contentType PUBLIC "-//W3C//NOTATION XHTML Datatype: ContentType//EN"> <!NOTATION contentTypes PUBLIC "-//W3C//NOTATION XHTML Datatype: ContentTypes//EN"> <!NOTATION datetime PUBLIC "-//W3C//NOTATION XHTML Datatype: Datetime//EN"> <!NOTATION languageCode PUBLIC "-//W3C//NOTATION XHTML Datatype: LanguageCode//EN"> <!NOTATION uri PUBLIC "-//W3C//NOTATION XHTML Datatype: URI//EN"> <!NOTATION uris PUBLIC "-//W3C//NOTATION XHTML Datatype: URIs//EN"> ]> <?doc type="doctype" role="title" { XHTML 1.1 } ?><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" version="-//W3C//DTD XHTML 1.1//EN"> <head profile=""> <title>About</title>
etc
-- David Dorward <http://dorward.me.uk/> __ ____ ____ ____ ____ ____ ____ ____ ____ ____ To control your jdom-interest membership: http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@(protected) .com
|
|
 |