I have several files containing the results of an ItemSearchRequest at
Amazon.com.
Most files are 100-200kB XML files, which are, according to the XML
declaration, UTF-8 encoded.
I read the file from Amazons REST interface (target is the url which
responds with the XML file):
Reader reader = new InputStreamReader(url.openStream());
BufferedReader bufferedreader = new BufferedReader(reader);
StringBuffer sb = new StringBuffer();
while (((c = bufferedreader.read()) != -1) && (c != 0)) {
sb.append((char)c);
}
result = sb.toString();
The string "result" will then be written into a RandomAccessFile.
Yet when I try to build a JDOM Document from the file using
Document doc = builder.build(file);
I keep getting a JDOMParseException for some of the files. Reason: The file
apparently contains non UTF-8 characters.
Question: How can I get the SAXBuilder to ignore those characters? Does
anybody know the reason why those characters even appear every once in a
while?
Below is the first part of the exception I mentioned.
Thanks
Matt
org.jdom.input.JDOMParseException: Error on line 1 of document
file:/d:/JavaCode/result.xml: Zeichenumwandlungsfehler: "Malformed UTF-8
char -- is an XML encoding declaration missing?" (Zeilenzahl m?glicherweise
zu niedrig)
at
org.jdom.input.SAXBuilder.build (
SAXBuilder.java:465)
at
org.jdom.input.SAXBuilder.build (
SAXBuilder.java:810)
at
org.jdom.input.SAXBuilder.build (
SAXBuilder.java:789)
at AmazonConnector.doItemSearch(AmazonConnector.java:98)
at MidTermProjectMain.main(MidTermProjectMain.java:41)
Caused by:
org.xml.sax.SAXParseException: Zeichenumwandlungsfehler:
"Malformed UTF-8 char -- is an XML encoding declaration missing?"
(Zeilenzahl m?glicherweise zu
niedrig)
at
org.apache.crimson.parser.InputEntity.fatal(InputEntity.java:1100)
at
org.apache.crimson.parser.InputEntity.fillbuf(InputEntity.java:1072)
at org.apache.crimson.parser.InputEntity.isEOF(InputEntity.java:262)
at
org.apache.crimson.parser.InputEntity.parsedContent(InputEntity.java:472)
at org.apache.crimson.parser.Parser2.content(Parser2.java:1871)
at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
at org.apache.crimson.parser.Parser2.content(Parser2.java:1824)
at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
at org.apache.crimson.parser.Parser2.content(Parser2.java:1824)
at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
at org.apache.crimson.parser.Parser2.content(Parser2.java:1824)
at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
at org.apache.crimson.parser.Parser2.content(Parser2.java:1824)
at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
at org.apache.crimson.parser.Parser2.content(Parser2.java:1824)
at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:534)
at org.apache.crimson.parser.Parser2.parse(Parser2.java:318)
at
org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:442)
at
org.jdom.input.SAXBuilder.build (
SAXBuilder.java:453)
... 4 more
_______________________________________________
To control your jdom-interest membership:
http://www.jdom.org/mailman/options/jdom-interest/youraddr@(protected)