Java Mailing List Archive

http://www.junlu.com/

Home » Home (12/2007) » JDOM User »

[jdom-interest] SAXbuilder and escape sequences

Luke Majewski

2005-10-12

Replies:

Hi all,

I have scoured the web for a solution to this and I am stumped.  I have an xml file with elements like:

<pr type="US">&stress1;r&aelig;bit
</pr>

When reading this in through the SAXbuilder, I get question marks and strange characters instead of the actual text.

Here is the code I am currently using, I figured it was an issue of encoding but it's not doing the trick:


        SAXBuilder sb = new SAXBuilder("org.apache.crimson.parser.XMLReaderImpl");
       
        InputSource is = new InputSource("file:///d:/workspace/OACD/OACD_rz.xml");
        is.setEncoding("UTF-8");
        sb.setEntityResolver(new EntityResolver() {
            public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {
                return new InputSource("file:///d:/workspace/oup-character-entities.ent");
            }
        });
        document = sb.build(is);

and the xml header is:

<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type='text/xsl' href="http://somestyle.xsl"?>
<!DOCTYPE dictionary SYSTEM "dictionary.dtd">
<dictionary xml:space='preserve'>

What I get back when I do a getText() on the element pr is "?r?bit"

I assume I am missing something obvious, pointing me to the right section of the documentation would be sufficient.

Thank you,

Luke Majewski
_______________________________________________
To control your jdom-interest membership:
http://www.jdom.org/mailman/options/jdom-interest/youraddr@(protected)
©2008 junlu.com - Jax Systems, LLC, U.S.A.