Java Mailing List Archive

http://www.junlu.com/

Home » Home (12/2007) » JDOM User »

[jdom-interest] CDATA and XMLOutputter - problem with normalization

Rick Beton

2004-04-22

Replies:

Hi All,

I posted a few weeks back questions about CDATA in XMLOutputter. Could
someone tell me whether there are plans address this?

1. I have two questions. The first is about outputting CDATA using
XMLOutputter. (I'm using b10.)

In XMLOutputter, there is this method:

protected void printCDATA(Writer out, CDATA cdata) throws IOException {
  String str = (currentFormat.mode == Format.TextMode.NORMALIZE)
     ? cdata.getTextNormalize()
     : ((currentFormat.mode == Format.TextMode.TRIM) ?
        cdata.getText().trim() : cdata.getText());
  out.write("<![CDATA[");
  out.write(str);
  out.write("]]>");
}

According to my (perhaps naive) understanding of what a CDATA is, it's
just 'stuff' and it shouldn't be formatted or normalised at all. If my
understanding is right, then the method should be simplified to

protected void printCDATA(Writer out, CDATA cdata) throws IOException {
  out.write("<![CDATA[");
  out.write(cdata.getText());
  out.write("]]>");
}

I came across this because of trying to generate XHTML with Javascript
nodes. The Javascript may happen work when normalised. However, if it
happens to contain '//' comments (in which the whitespace line ending is
significant), then
the normalisation will quite likely break the Javascript and is clearly
undesirable. Formatters shouldn't attempt to pretty-up something that
is inherently opaque.


2. My more general question is this: I noticed that org.jdom.CDATA
extends org.jdom.Text. What is the rationale for CDATA extending Text?

org.jdom.Text represents XML 'character data'. There isn't really an
'is-a' relationship between this and CDATA. Indeed, the spec
(http://www.w3.org/TR/2004/REC-xml-20040204/#syntax) specifically
distinguishes between CDATA and general (parsed) character data.

[For background, two relevant sentences are:

  * Definition: Markup takes the form of start-tags, end-tags,
empty-element tags, entity references, character references, comments,
CDATA section delimiters, document type declarations, processing
instructions, XML declarations, text declarations, and any white space
that is at the top level of the document entity (that is, outside the
document element and not inside any other markup).

  * Definition: All text that is not markup constitutes the character
data of the document.]

In an ideal world, CDATA and Text would be peer classes, both extending
Content as their common base class. Clearly it would be an intrusive
upheaval to change this now. However, would it be possible to document
the CDATA class to note the differences from the Text class? Also, it
would be possible to provide methods in CDATA for getTextNormalize() and
getTextTrim() to have non-normalizing behaviour?

See attached suggested patched versions (CDATA.java attached,
XMLOutputter to follow).

Regards,
Rick :-)



/*--

$Id: Sent,v 1.152 2004/04/19 23:08:25 rick Exp $

Copyright (C) 2000-2004 Jason Hunter & Brett McLaughlin.
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:

1. Redistributions of source code must retain the above copyright
  notice, this list of conditions, and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright
  notice, this list of conditions, and the disclaimer that follows
  these conditions in the documentation and/or other materials
  provided with the distribution.

3. The name "JDOM" must not be used to endorse or promote products
  derived from this software without prior written permission. For
  written permission, please contact <request_AT_jdom_DOT_org>.

4. Products derived from this software may not be called "JDOM", nor
  may "JDOM" appear in their name, without prior written permission
  from the JDOM Project Management <request_AT_jdom_DOT_org>.

In addition, we request (but do not require) that you include in the
end-user documentation provided with the redistribution and/or in the
software itself an acknowledgement equivalent to the following:
  "This product includes software developed by the
   JDOM Project (http://www.jdom.org/)."
Alternatively, the acknowledgment may be graphical using the logos
available at http://www.jdom.org/images/logos.

THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE JDOM AUTHORS OR THE PROJECT
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.

This software consists of voluntary contributions made by many
individuals on behalf of the JDOM Project and was originally
created by Jason Hunter <jhunter_AT_jdom_DOT_org> and
Brett McLaughlin <brett_AT_jdom_DOT_org>. For more information
on the JDOM Project, please see <http://www.jdom.org/>.

*/

package org.jdom;

/**
* An XML CDATA section. Represents character-based content within an XML
* document that should be output within special CDATA tags. Semantically it's
* similar to a simple {@(protected).
* CDATA makes no guarantees about the underlying textual representation of
* character data, but does expose that data as a Java String.
* <p>
* The XML spec (http://www.w3.org/TR/2004/REC-xml-20040204/#syntax) specifically
* distinguishes between CDATA and general (parsed) character data. The
* two relevant sentences are:
* <ul>
* <li>Definition: Markup takes the form of start-tags, end-tags, empty-element
* tags, entity references, character references, comments, CDATA section delimiters,
* document type declarations, processing instructions, XML declarations, text
* declarations, and any white space that is at the top level of the document entity
* (that is, outside the document element and not inside any other markup).</li>
*
* <li>Definition: All text that is not markup constitutes the character data of
* the document.</li>
* </ul>
*
* @version $Revision: 1.152 $, $Date: 2004/04/19 23:08:25 $
* @author Dan Schaffer
* @author Brett McLaughlin
* @author Jason Hunter
* @author Bradley S. Huffman
*/
public class CDATA extends Text {

  private static final String CVS_ID =
   "@(protected): $";

  /**
  * This is the protected, no-args constructor standard in all JDOM
  * classes. It allows subclassers to get a raw instance with no
  * initialization.
  */
  protected CDATA() { }

  /**
  * This constructor creates a new <code>CDATA</code> node, with the
  * supplied string value as it's character content.
  *
  * @param str the node's character content.
  * @throws IllegalDataException if <code>str</code> contains an
  *      illegal character such as a vertical tab (as determined
  *       by {@(protected)})
  *      or the CDATA end delimiter <code>]]&gt;</code>.
  */
  public CDATA(String str) {
    setText(str);
  }

  /**
  * This returns the textual content verbatim. Unlike
  * {@(protected)
  * the CDATA whitespace intact.
  *
  * @return text content or empty string
  */
  public String getTextTrim() {
    // Overrides Text.getTextTrim() because CDATA shouldn't be
    // altered by trimming or normalization.
    return getText();
  }

  /**
  * This returns the textual content verbatim. Unlike
  * {@(protected)
  * leaves the CDATA whitespace intact.
  *
  * @return text content or empty string
  */
  public String getTextNormalize() {
    // Overrides Text.getTextNormalize() because CDATA shouldn't be
    // altered by trimming or normalization.
    return getText();
  }

  /**
  * This will set the value of this <code>CDATA</code> node.
  *
  * @param str value for node's content.
  * @return the object on which the method was invoked
  * @throws IllegalDataException if <code>str</code> contains an
  *      illegal character such as a vertical tab (as determined
  *       by {@(protected)})
  *      or the CDATA end delimiter <code>]]&gt;</code>.
  */
  public Text setText(String str) {
    // Overrides Text.setText() because this needs to check CDATA
    // rules are enforced. We could have a separate Verifier check
    // for CDATA beyond Text and call that alone before super.setText().

    String reason;

    if (str == null) {
       value = EMPTY_STRING;
       return this;
    }

    if ((reason = Verifier.checkCDATASection(str)) != null) {
       throw new IllegalDataException(str, "CDATA section", reason);
    }
    value = str;
    return this;
  }

  /**
  * This will append character content to whatever content already
  * exists within this <code>CDATA</code> node.
  *
  * @param str character content to append.
  * @throws IllegalDataException if <code>str</code> contains an
  *      illegal character such as a vertical tab (as determined
  *       by {@(protected)})
  *      or the CDATA end delimiter <code>]]&gt;</code>.
  */
  public void append(String str) {
    // Overrides Text.setText() because this needs to check CDATA
    // rules are enforced. We could have a separate Verifier check
    // for CDATA beyond Text and call that alone before super.setText().

    String reason;

    if (str == null) {
       return;
    }
    if ((reason = Verifier.checkCDATASection(str)) != null) {
       throw new IllegalDataException(str, "CDATA section", reason);
    }

    if (value == EMPTY_STRING)
        value = str;
    else value += str;
  }

  /**
  * This returns a <code>String</code> representation of the
  * <code>CDATA</code> node, suitable for debugging. If the XML
  * representation of the <code>CDATA</code> node is desired,
  * either <code>{@(protected)
  * {@(protected)>
  * should be used.
  *
  * @return <code>String</code> - information about this node.
  */
  public String toString() {
    return new StringBuffer(64)
       .append("[CDATA: ")
       .append(getText())
       .append("]")
       .toString();
  }
}


Attachment: smime.p7s
©2008 junlu.com - Jax Systems, LLC, U.S.A.