Well, the task is pretty simple, but I can't get anything working.
I want to parse in an xhtml document containing mathml with all the
entitities defined like alpha and beta. This should then be transformed
using xslt into another xml-document.
The test-xhtml document is shown below:
----
<?xml version="1.0"
encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML
2.0//EN"
"http://www.w3.org/TR/MathML2/dtd/xhtml-math11-f.dtd"
[<!ENTITY mathml
"http://www.w3.org/1998/Math/MathML">]>
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<math
xmlns="http://www.w3.org/1998/Math/MathML">
<mrow>
<mi>ζ</mi>
</mrow>
</math>
</body>
</html>
----
Here is what I've tryed:
Parsing the document in using a SAXBuilder with the default
settings:
-----
SAXBuilder builder = new
SAXBuilder();
FileInputStream stream =
null;
if (file.exists()) {
try
{
stream = new FileInputStream(file);
InputStreamReader reader = new InputStreamReader(stream);
builder.setValidation(false);
try {
doc = builder.build(reader);
} catch (Exception e) {
e.printStackTrace();
}
}
}
---
This results in this error:
"org.jdom.IllegalTargetException: The target
"IS10744:arch" is not legal for JDOM/XML Processing
Instructions: Processing instruction targets cannot contain
colons."
Then I tryed to trick the SAXBuilder so that the DTD's are not used
by setting the entityResolver to an entityResolver, that doesn't do
anything.
---
SAXBuilder
builder = new SAXBuilder();
builder.setEntityResolver(new
NoOpEntityResolver());
---
This results in some output to System.err:
"[Fatal
Error] :1:66: White spaces are required between publicId and
systemId."
But the transformation seems to occur.
I tryed writing the parsed document to a file. This file doesn't contain
the entity:
ζ
---
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML
2.0//EN"
"http://www.w3.org/TR/MathML2/dtd/xhtml-math11-f.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<body bgcolor="white">
Hello world
<math
xmlns="http://www.w3.org/1998/Math/MathML">
<mrow>
<mi>?</mi>
</mrow>
</math>
</body>
</html>
---
That could be due to an encoding mistake somewhere.
So as you can tell I've been struggling with this issue for quite some
time getting nowhere. Is it really that difficult parsing in an xhtml
document and transforming it using xslt?
How can I transform an xhtml document containing mathml into another xml
document using xslt?
Regards
Morten Andersen
Master of applied mathematics and computer science
Associate professor
The Maersk Institute of Production technology at Southern Danish
University
www.mip.sdu.dk
Campusvej 55
DK-5230 Odense M
Denmark
+45 65 50 36 54
+45 61 71 11 03
Jabber id: hat@jabber.dk