Entity resolving - design problem 2003-10-23 - By Todd O'Bryan
Back
On Thursday, October 23, 2003, at 07:01 AM, Robert J Munro wrote:
> Todd O'Bryan wrote: > >> There is, in fact, a way to do this. You can subclass a Reader and >> intercept the character stream on the way into the Parser. If you get >> an ampersand followed by one of the entities you don't want to >> expand, you pass it on as &entity;, if not, you just pass them >> on. >> >> When you write the file back through the Writer you'll have to be >> sure that you intercept again and change &entity; back to >> &entity; on the way out. >> >> All in all, it's about twenty lines of code overwriting read() and >> write() in subclasses of Reader and Writer. >> >> Email me if you need more specifics, >> Todd > > That sounds like a horrendously bad idea. It goes completely against > the whole principle of JDOM (i.e. that you deal with the data, not > with the XML). Until XML can do a round-trip with entities, this will continue to be a problem. I was dealing with XML documents created by a client that included entities which were nowhere defined. Yes, I realize undefined entities lead to malformed XML (not just invalid), but the funny thing is, the client was not terribly open to the idea that they should have to fix up their bad XML before I would process it. And I could not afford to wait and see which new undefined entity would crash my program in a new batch of data they hadn't sent me. Got a less horrendously bad idea now?
> > I think the best solution in this case is to use an extra attribute in > your own namespace (something like <img my:file="name.jpg" />) to say > what the image filename is without a directory while it is XML, then > generate the real src attribute with a URL by later. You're probably right. When you're defining the format, a hack like the one above is not the best choice. It is, however, doable. And if the things that people called entities are data and not just entities, then you have to deal with them.
A good example of this would be something like &date; which presumably prints out the current date. If you resolve that on your parse, fiddle with it and then want to re-write the original document with your changes, you're screwed. The fact that "October 23, 2003" was once "&date;" is just lost information. Fine if XML were only intended to go one way, but it's not.
In the spec, they made it possible to do things with entities that are just a really bad idea, and some of the documentation even suggested doing these things. Then people do them, and tie themselves in knots, and get annoyed.
> > Javascript sections could be fixed by defining an image directory in > .js files on each location, then changing: > document.blah.src="/path/another.gif" > to > document.blah.src= imagedirectory + "another.gif" > > The solution I would use, however, is to put the images in the same > location on both servers, either relative to the root of the server, > or relative to the documents that reference them. If both those > options really are impossible, then I'd put the images on a public > server, and have them both point to them with absolute URLs. > Umm, how would you do this if you don't have access rights to the same directory structures on the two servers? And wouldn't it be a horrendously bad idea to make someone viewing a file on a local server wait while the images are fetched from another server just so you don't have to deal with resolving different file prefixes?
Todd
__ ____ ____ ____ ____ ____ ____ ____ ____ ____ To control your jdom-interest membership: http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@(protected) .com
Earn $52 per hosting referral at Lunarpages.
|
|