Java Mailing List Archive

http://www.junlu.com/

Home » Home (12/2007) » JDOM User »

[jdom-interest] SAXBuilder: How to handle non UTF-8 characters?
 (JDOMParseException)

Matthias Klein

2004-11-10

Replies:

I have several files containing the results of an ItemSearchRequest at
Amazon.com.
Most files are 100-200kB XML files, which are, according to the XML
declaration, UTF-8 encoded.

I read the file from Amazons REST interface (target is the url which
responds with the XML file):

  Reader reader = new InputStreamReader(url.openStream());
  BufferedReader bufferedreader = new BufferedReader(reader);
  StringBuffer sb = new StringBuffer();  
  while (((c = bufferedreader.read()) != -1) && (c != 0)) {

          sb.append((char)c);
       }
  result = sb.toString();

The string "result" will then be written into a RandomAccessFile.

Yet when I try to build a JDOM Document from the file using

 Document doc = builder.build(file);

I keep getting a JDOMParseException for some of the files. Reason: The file
apparently contains non UTF-8 characters.

Question: How can I get the SAXBuilder to ignore those characters? Does
anybody know the reason why those characters even appear every once in a
while?

Below is the first part of the exception I mentioned.

Thanks

Matt


org.jdom.input.JDOMParseException: Error on line 1 of document
file:/d:/JavaCode/result.xml: Zeichenumwandlungsfehler: "Malformed UTF-8
char -- is an XML encoding declaration missing?" (Zeilenzahl m?glicherweise
zu niedrig)
    at org.jdom.input.SAXBuilder.build (SAXBuilder.java:465)
    at org.jdom.input.SAXBuilder.build (SAXBuilder.java:810)
    at org.jdom.input.SAXBuilder.build (SAXBuilder.java:789)
    at AmazonConnector.doItemSearch(AmazonConnector.java:98)
    at MidTermProjectMain.main(MidTermProjectMain.java:41)
Caused by: org.xml.sax.SAXParseException: Zeichenumwandlungsfehler:
"Malformed UTF-8 char -- is an XML encoding declaration missing?"
(Zeilenzahl m?glicherweise zu
niedrig)
    at
org.apache.crimson.parser.InputEntity.fatal(InputEntity.java:1100)
    at
org.apache.crimson.parser.InputEntity.fillbuf(InputEntity.java:1072)
    at org.apache.crimson.parser.InputEntity.isEOF(InputEntity.java:262)
    at
org.apache.crimson.parser.InputEntity.parsedContent(InputEntity.java:472)
    at org.apache.crimson.parser.Parser2.content(Parser2.java:1871)
    at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
    at org.apache.crimson.parser.Parser2.content(Parser2.java:1824)
    at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
    at org.apache.crimson.parser.Parser2.content(Parser2.java:1824)
    at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
    at org.apache.crimson.parser.Parser2.content(Parser2.java:1824)
    at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
    at org.apache.crimson.parser.Parser2.content(Parser2.java:1824)
    at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
    at org.apache.crimson.parser.Parser2.content(Parser2.java:1824)
    at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
    at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:534)
    at org.apache.crimson.parser.Parser2.parse(Parser2.java:318)
    at
org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:442)
    at org.jdom.input.SAXBuilder.build (SAXBuilder.java:453)
    ... 4 more



_______________________________________________
To control your jdom-interest membership:
http://www.jdom.org/mailman/options/jdom-interest/youraddr@(protected)
©2008 junlu.com - Jax Systems, LLC, U.S.A.