Java Mailing List Archive

http://www.junlu.com/

Home » Home (12/2007) » JDOM User »

Re: [jdom-interest] Verbose XHTML 1.1 Doctype

David Dorward

2004-03-27

Replies:

On Thu, 2004-03-25 at 08:07, Stein Erik Berget wrote:
> On Wed, 24 Mar 2004 18:47:47 +0000, David Dorward <david@(protected)>
> wrote:
> > I have a number of XHTML 1.1 documents, all conforming to the same
> > template, which I want to extract some data from and then insert that
> > data into different XHTML 1.1 documents.
> >
> > As a first step I am trying to read in a document and then print it out
> > again without any modification. I've run into two issues:
> >
> > 1. It appears to be downloading the DTD from the w3c website - this
> > takes time and bandwidth.

Thanks to Mr Berget this issue is now resolved, and its lightning fast
(Thanks!).

> > 2. It seems to be expanding the Doctype line (example below).

This one, unfortunately, is still a problem. Does anybody have a solution?

> > Is there any way to stop this? I'd like to leave the Doctype alone and
> > save time on reading the DTD (I don't care about validation - that is
> > handled elsewhere). I couldn't find anything looking at the docs, but I
> > suspect this is due to not knowing what to look for.

My code now looks like this:

import org.apache.xerces.util.XMLCatalogResolver;
import org.jdom.*;
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import java.io.IOException;

public class Parse {

public static void main (String [] args) {
 //path to find the catalog.xml file
 String cat[] = {"file:///home/david/prog/cms/java/catalog.xml"};
 XMLCatalogResolver resolver = new XMLCatalogResolver();
 resolver.setPreferPublic(true);
 resolver.setCatalogList(cat);

 SAXBuilder builder = new SAXBuilder(true);
 builder.setProperty(
   "http://apache.org/xml/properties/internal/entity-resolver",
   resolver);
   
 Document doc;
 XMLOutputter outputter = new XMLOutputter();
   try {
   doc = builder.build("/home/david/prog/cms/dorward.me.uk/about/index.html");
   try {
     outputter.output(doc, System.out);    
   } catch (IOException e) {
     System.err.println(e);
   }
 } catch (JDOMException e) {
   // indicates a well-formedness or other error
   System.out.println(" is not well formed: " + e.getMessage());
 } catch (IOException e) {
   System.out.println("Could not check ");
   System.out.println(" because " + e.getMessage());
 }
}
}

The input document starts:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xht
ml" xml:lang="en">
<head>

But the output document:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" [
<!NOTATION w3c-xml PUBLIC "ISO 8879//NOTATION Extensible Markup Language (XML) 1.0//EN">
<!NOTATION cdata PUBLIC "-//W3C//NOTATION XML 1.0: CDATA//EN">
<!NOTATION fpi PUBLIC "ISO 8879:1986//NOTATION Formal Public Identifier//EN">
<!NOTATION length PUBLIC "-//W3C//NOTATION XHTML Datatype: Length//EN">
<!NOTATION linkTypes PUBLIC "-//W3C//NOTATION XHTML Datatype: LinkTypes//EN">
<!NOTATION mediaDesc PUBLIC "-//W3C//NOTATION XHTML Datatype: MediaDesc//EN">
<!NOTATION multiLength PUBLIC "-//W3C//NOTATION XHTML Datatype: MultiLength//EN">
<!NOTATION number PUBLIC "-//W3C//NOTATION XHTML Datatype: Number//EN">
<!NOTATION pixels PUBLIC "-//W3C//NOTATION XHTML Datatype: Pixels//EN">
<!NOTATION script PUBLIC "-//W3C//NOTATION XHTML Datatype: Script//EN">
<!NOTATION text PUBLIC "-//W3C//NOTATION XHTML Datatype: Text//EN">
<!NOTATION character PUBLIC "-//W3C//NOTATION XHTML Datatype: Character//EN">
<!NOTATION charset PUBLIC "-//W3C//NOTATION XHTML Datatype: Charset//EN">
<!NOTATION charsets PUBLIC "-//W3C//NOTATION XHTML Datatype: Charsets//EN">
<!NOTATION contentType PUBLIC "-//W3C//NOTATION XHTML Datatype: ContentType//EN">
<!NOTATION contentTypes PUBLIC "-//W3C//NOTATION XHTML Datatype: ContentTypes//EN">
<!NOTATION datetime PUBLIC "-//W3C//NOTATION XHTML Datatype: Datetime//EN">
<!NOTATION languageCode PUBLIC "-//W3C//NOTATION XHTML Datatype: LanguageCode//EN">
<!NOTATION uri PUBLIC "-//W3C//NOTATION XHTML Datatype: URI//EN">
<!NOTATION uris PUBLIC "-//W3C//NOTATION XHTML Datatype: URIs//EN">
]>
<?doc type="doctype" role="title" { XHTML 1.1 } ?><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" version="-//W3C//DTD XHTML 1.
1//EN">
<head profile="">


--
David Dorward                      <http://dorward.me.uk/>
_______________________________________________
To control your jdom-interest membership:
http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@(protected)
©2008 junlu.com - Jax Systems, LLC, U.S.A.