Java Mailing List Archive

http://www.junlu.com/

Home » Home (12/2007) » JDOM User »

Re: [jdom-interest] ElementScanner - causing SAXHandler to mistake
nonroot element for root element

Laurent Bihanic

2004-05-05

Replies:


Hi Richard,

Sorry for the long delay. I had a chance to look at your problem. Indeed, this
is a problem in ElementScanner and your analysis is correct.

> To fix, I removed the if (this.activeRules.size() != 0) test that contained
> the startElement() call to XMLScanner, so that it always propogates the
> event to the SAXHandler.

Your fix proposal to always propagate the startElement events to SAXHandler is
quite dangerous as it forces SAXHandler to build a full JDOM document from the
parser output (which is pr�cisely what ElementScanner aims at avoiding).
Thus, I think we should keep the "if (this.activeRules.size() != 0)" test to
support extracting some nodes from huge document while using as little memory
as possible.

Attached is another patch proposal: Instead of directly using SAXHandler, it
relies on a subclass (FragmentHandler, borrowed from JDOMResult) that inserts
a dummy root document in SAXHandler's document.
This guarantees that, whatever your matching rules, SAXHandler will always
have a single root document.

What do you think,

Laurent


Richard Allen wrote:
> Hi All,
>
> With the following XML:
> <blah>
>     <huh>1234</huh>
>     <blam>
>         <yay>woohoo</yay>
>     </blam>
>     <blam>
>         <yay>mwuhahaha</yay>
>     </blam>
>     <nah>5678</nah>
> </blah>
>
> And listeners on the following:
> /blah/huh
> /blah/blam
>
> The /blah/huh element is processed sweet as..
> But when the /blah/blam element is being processed, the
> SAXHandler.startElement() throws the following exception:
>
> org.xml.sax.SAXException: Ill-formed XML document (multiple root elements
> detected)
>     at org.jdom.input.SAXHandler.getCurrentElement (SAXHandler.java:906)
>     at org.jdom.input.SAXHandler.startElement (SAXHandler.java:553)
>     at
> org.jdom.contrib.input.scanner.ElementScanner.startElement(ElementScanner.java:554)
>
> This is a bit weird, given that the //blam element isn't the root element
> ;-)
>
> The problem is that the XMLScanner is not being notified until after the
> first element that contains active rules has been found.
> This causes SAXHandler to think that the /blah/huh element is actually the
> root.
> When the ElementScanner notifies SAXHandler of the /blah/blam element it
> throws a hissy fit as it has already ended what it thinks is the root
> element ;-)
>
> To fix, I removed the if (this.activeRules.size() != 0) test that contained
> the startElement() call to XMLScanner, so that it always propogates the
> event to the SAXHandler.
>
> Comments appreciated as to whether this fix is the ideal fix, or if there
> is a better way to fix this problem.
> cheers,
> Rich
Index: ElementScanner.java
===================================================================
RCS file: /home/cvspublic/jdom-contrib/src/java/org/jdom/contrib/input/scanner/ElementScanner.java,v
retrieving revision 1.11
diff -u -r1.11 ElementScanner.java
--- ElementScanner.java  28 Feb 2004 03:47:08 -0000  1.11
+++ ElementScanner.java  5 May 2004 12:53:26 -0000
@@(protected) @@
    //----------------------------------------------------------------------

    protected SAXHandler createContentHandler() {
-      return (new SAXHandler(new EmptyDocumentFactory(getFactory())));
+      return (new FragmentHandler(new EmptyDocumentFactory(getFactory())));
    }

    //----------------------------------------------------------------------
@@(protected) @@
  }

  //-------------------------------------------------------------------------
+  // FragmentHandler nested class
+  //-------------------------------------------------------------------------
+
+  /**
+   * FragmentHandler extends SAXHandler to support matching nodes
+   * without a common ancestor. This class inserts a dummy root
+   * element in the being-built document. This prevents the document
+   * to have, from SAXHandler's point of view, multiple root
+   * elements (which would cause the parse to fail).
+   */
+  private static class FragmentHandler extends SAXHandler {
+    /**
+     * Public constructor.
+     */
+    public FragmentHandler(JDOMFactory factory) {
+      super(factory);
+
+      // Add a dummy root element to the being-built document as XSL
+      // transformation can output node lists instead of well-formed
+      // documents.
+      this.pushElement(new Element("root", null, null));
+    }
+  }
+
+  //-------------------------------------------------------------------------
  // EmptyDocumentFactory nested class
  //-------------------------------------------------------------------------

©2008 junlu.com - Jax Systems, LLC, U.S.A.