Hi All,
I posted a few weeks back questions about CDATA in XMLOutputter. Could
someone tell me whether there are plans address this?
1. I have two questions. The first is about outputting CDATA using
XMLOutputter. (I'm using b10.)
In XMLOutputter, there is this method:
protected void printCDATA(Writer out, CDATA cdata) throws IOException {
String str = (currentFormat.mode == Format.TextMode.NORMALIZE)
? cdata.getTextNormalize()
: ((currentFormat.mode == Format.TextMode.TRIM) ?
cdata.getText().trim() : cdata.getText());
out.write("<![CDATA[");
out.write(str);
out.write("]]>");
}
According to my (perhaps naive) understanding of what a CDATA is, it's
just 'stuff' and it shouldn't be formatted or normalised at all. If my
understanding is right, then the method should be simplified to
protected void printCDATA(Writer out, CDATA cdata) throws IOException {
out.write("<![CDATA[");
out.write(cdata.getText());
out.write("]]>");
}
I came across this because of trying to generate XHTML with Javascript
nodes. The Javascript may happen work when normalised. However, if it
happens to contain '//' comments (in which the whitespace line ending is
significant), then
the normalisation will quite likely break the Javascript and is clearly
undesirable. Formatters shouldn't attempt to pretty-up something that
is inherently opaque.
2. My more general question is this: I noticed that
org.jdom.CDATAextends org.jdom.Text. What is the rationale for CDATA extending Text?
org.jdom.Text represents XML 'character data'. There isn't really an
'is-a' relationship between this and CDATA. Indeed, the spec
(http://www.w3.org/TR/2004/REC-xml-20040204/#syntax) specifically
distinguishes between CDATA and general (parsed) character data.
[For background, two relevant sentences are:
* Definition: Markup takes the form of start-tags, end-tags,
empty-element tags, entity references, character references, comments,
CDATA section delimiters, document type declarations, processing
instructions, XML declarations, text declarations, and any white space
that is at the top level of the document entity (that is, outside the
document element and not inside any other markup).
* Definition: All text that is not markup constitutes the character
data of the document.]
In an ideal world, CDATA and Text would be peer classes, both extending
Content as their common base class. Clearly it would be an intrusive
upheaval to change this now. However, would it be possible to document
the CDATA class to note the differences from the Text class? Also, it
would be possible to provide methods in CDATA for getTextNormalize() and
getTextTrim() to have non-normalizing behaviour?
See attached suggested patched versions (CDATA.java attached,
XMLOutputter to follow).
Regards,
Rick :-)