   | Mailing List | | Home | | Forum Home | | JBoss - Java Application Server | | Struts - A MVC web framework | | Tomcat - JSP/Servlet container | | iText - An open source PDF Java Library | | JDOM - JDOM XML Parser | | J2EE - A mailing list for Java(tm) 2 Platform, Enterprise Edition | | J2EE Pattern - An interest list for Sun Java Center J2EE Pattern Catalog | | Servlet - A mailing list for discussion about Sun Microsystem's Java Servlet API Technology | | JSP - A mailing list about Java Server Pages specification and reference | |
Struts & Hibernate
|
|
|
  | |  | Charset conversion problem | Charset conversion problem 2003-10-03 - By Eric VERGNAUD
Back Hi JDOM,
I don't know if this is a bug or a setting I cannot find. I'm running JDOM b-9 on MacOSX.
>From a byte stream, I receive a xml document with some accented characters in it. For example:
<record> ?tats-unis </record>
The above character ? is properly encoded in UTF-8 as bytes C3 89 which decode to C9 or 201 which is indeed the Unicode value for that character.
However when I parse the document, and then get the text from the element, It appears that the 201 has turned into 131 which happens to be the code for ? in the MacOS latin charset.
So it looks like the element data is converted to the platform charset rather than unicode.
I hope I'm simply missing something. Here is how I parse the data:
byte[] data; // comes from elsewhere (at this point the bytes are C389) InputStream is = new ByteArrayInputStream(data); SAXBuilder sax = new SAXBuilder(); sax.setIgnoringElementContentWhitespace(true); Document received = sax.build(is); // when I get there the character is 131 instead of 201
Any advice will be appreciated,
Eric
__ ____ ____ ____ ____ ____ ____ ____ ____ ____ To control your jdom-interest membership: http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@(protected) .com
Earn $52 per hosting referral at Lunarpages.
|
|
 |