Java Mailing List Archive

http://www.junlu.com/

Home » Home (12/2007) » JDOM User »

Re: [jdom-interest] Re: Getting original Encoding and changing the
 d efau lt UTF-8

Jason Hunter

2004-09-10

Replies:

Young Matthew wrote:

> hej,
>
> Regarding the default encoding I more thinking on the front end and not with
> printing. In other words before parsing a document it would be cool if I could
> shift the encoding to someother than UTF-8 to handle svenska characters.

XML files generally have their encoding listed in the declaration if
they're not UTF-8. So the parser automatically can determine the proper
encoding to use. Getting the data in correctly isn't an issue; the
issue arises if you want to encode the document the same way on output
instead of using the universal UTF-8 encoding. SAX doesn't report what
the original encoding was, just returns the already-decoded characters.

Another builder, like an XNI builder, could report the encoding. The
Document class doesn't currently have an encoding property but we could
add one if we had a parser that reported it. That is, assuming it's a
document-level notion. The story's less clear when pulling together
elements from multiple documents. If the original Document node was
Latin-1 but you included an Element from a Shift_JIS document, you can't
reliably assume Latin-1 for the new document.

-jh-

_______________________________________________
To control your jdom-interest membership:
http://www.jdom.org/mailman/options/jdom-interest/youraddr@(protected)

©2008 junlu.com - Jax Systems, LLC, U.S.A.