  | Mailing List | | Home | | Forum Home | | JBoss - Java Application Server | | Tomcat - JSP/Servlet container | | Struts - A MVC web framework | | iText - An open source PDF Java Library | | JDOM - JDOM XML Parser | | JSP - A mailing list about Java Server Pages specification and reference | | J2EE - A mailing list for Java(tm) 2 Platform, Enterprise Edition | | J2EE Pattern - An interest list for Sun Java Center J2EE Pattern Catalog | | Servlet - A mailing list for discussion about Sun Microsystem's Java Servlet API Technology | |
Struts & Hibernate
|
|
|
  | | | ElementScanner and Memory | ElementScanner and Memory 2006-11-13 - By Brian Nahas
Back I have a 1.2 GB xml file I need to parse. Since it's nicely partitioned, I planned on using ElementScanner from the contrib package to only load one item at a time. Here's an equivalent schema:
<data> <item>...</item> <item>...</item> <item>...</item> ... </data>
The path for I'm using for my listener is "/data/item".
I assumed any previous items would be released by the parser upon completion. ElementScanner was very simple to set up to handle this, however I ran into an OutOfMemory error on my first try. I was a little confused as I thought ElementScanner was specifically designed to prevent this. Upon investigation, I found that the SAXHandler used by the ElementScanner was holding onto the previous items after I was done with them. It adds them to the default root element that FragmentHandler creates and nothing removes them after the listeners are called. This seems to be in direct conflict with this message I found which states that ElementScanner doesn't build a document (this message is fairly old though):
http://www.servlets.com/archive/servlet/ReadMsg?msgId=350607&listName=jdom -interest
I worked around this by explicitly detaching the element in my listener when I was done with it, but since it seems like this would be a common pattern and subtle trap, so I thought I'd ask and see if I was missing some setting or improperly using ElementScanner. There's a namespace declared on the data element so I don't know if that has something to do with it.
Thanks, -Brian
<html><head><style type="text/css"><!-- DIV {margin:0px;} --></style></head> <body><div style="font-family:times new roman, new york, times, serif;font-size :12pt"><div>I have a 1.2 GB xml file I need to parse. Since it's nicely partitioned, I planned on using ElementScanner from the contrib package to only load one item at a time. Here's an equivalent schema:<br><br>< ;data><br> <item>...</item><br> <item>...</item><br> <item>...< /item><br> ...<br></data><br><br>The path for I'm using for my listener is "/data/item".<br><br>I assumed any previous items would be released by the parser upon completion. ElementScanner was very simple to set up to handle this, however I ran into an OutOfMemory error on my first try. I was a little confused as I thought ElementScanner was specifically designed to prevent this. Upon investigation, I found that the SAXHandler used by the ElementScanner was holding onto the previous items after I was done with them. It adds them to the default root element that FragmentHandler creates and nothing removes them after the listeners are called. This seems to be in direct conflict with this message I found which states that ElementScanner doesn't build a document (this message is fairly old though):<br><br><span><a rel="nofollow" target="_blank" href="http://www.servlets.com/archive/servlet/ReadMsg?msgId=350607&listName =jdom-interest">http://www.servlets.com/archive/servlet/ReadMsg?msgId=350607& ;listName=jdom-interest</a></span><br><br>I worked around this by explicitly detaching the element in my listener when I was done with it, but since it seems like this would be a common pattern and subtle trap, so I thought I'd ask and see if I was missing some setting or improperly using ElementScanner. There's a namespace declared on the data element so I don't know if that has something to do with it.<br><br>Thanks,<br>-Brian</div></div></body></html> __ ____ ____ ____ ____ ____ ____ ____ ____ ____ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr@(protected)
|
|
 |