When reading this in through the SAXbuilder, I get question marks and strange characters instead of the actual text.
Here is the code I am currently using, I figured it was an issue of encoding but it's not doing the trick:
SAXBuilder sb = new SAXBuilder("org.apache.crimson.parser.XMLReaderImpl");
InputSource is = new InputSource("
file:///d:/workspace/OACD/OACD_rz.xml");
is.setEncoding("UTF-8");
sb.setEntityResolver(new EntityResolver() {
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {
return new
InputSource("
file:///d:/workspace/oup-character-entities.ent");
}
});
document = sb.build(is);
and the xml header is:
<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type='text/xsl' href="
http://somestyle.xsl"?>
<!DOCTYPE dictionary SYSTEM "dictionary.dtd">
<dictionary xml:space='preserve'>
What I get back when I do a getText() on the element pr is "?r?bit"
I assume I am missing something obvious, pointing me to the
right section of the documentation would be sufficient.
Thank you,
Luke Majewski