Hello all,
I am having some trouble figuring out how to go about resolving
entities when an XML file doesn't have DOCTYPE declaration (no DTD
attached to it), but contains entities that are 'non-standarad' (such
as, ' ', etc...). I need to do this in such a way that I don't
change the XML file (without added DOCTYPE declaration, etc..).
My need for the above is as follows:
SAXBuilder builder = new SAXBuilder();
....
fulltextXML = builder.build(new FileInputStream(filename));
-- fails with an exception ---
C:\HTMLs\00063185_200_1_67\00063185_200_1_67_Document.xml is not well-formed.
org.jdom.input.JDOMParseException: Error on line 5: The entity "nbsp"
was referenced, but not declared.
Error on line 5: The entity "nbsp" was
referenced, but not declared.
Is there a way to resolve such entities, without having to declare the DOCTYPE in the XML file?
Thanks in advance!
Vish
Sample XML file:
XML FILE
--------------
<?xml version="1.0" encoding="UTF-8"?>
<object_document>
<art_title>
Muscular Alteration of Gill Geometry in vitro: Implications for
Bivalve Pumping Processes -- Medler and Silverman 200 (1): 77 -- The
Biological Bulletin</art_title>
<converted_from type='HTML'>BiolBull V 200 I 1 P 77 Fulltext 00063185.htm</converted_from>
<fulltext> Biol. Bull. 200:
77-86. (February 2001)© 2001 Marine Biological
LaboratoryMuscular Alteration of Gill Geometry in vitro: Implications
for Bivalve Pumping ProcessesScott Medler* and Harold
SilvermanLouisiana State University, Baton Rouge, Louisiana 70803*
Author to whom correspondence should be addressed. Current address:
Department of Biology, Colorado State University, Ft. Collins, CO
80523. E-mail: Skmedler{at}aol.com<!-- var u = "Skmedler", d =
"
aol.com"; document.getElementById("em0").innerHTML = "" + u + "@" + d
+ ""//-->
Received 23 March 2000; accepted 19 October 2000.
</fulltext>
<jrnl_title>BiolBull</jrnl_title>
<issn>00063185</issn>
<volume>200</volume>
<issue>1</issue>
<fpage>77</fpage>
</object_document>