Swanson, Brion writes:
 > If I recall correctly, a conforming XML parser such as Xerces is required to
 > resolve all entities it encounters in an XML document regardless of whether
 > or not you are validating that document. 

Why?

 > In your case, (I believe) you're
 > telling the parser not to load any external DTDs even for the purposes of
 > entity resolution.  So when it encounters your ® entity it checks its
 > well-known entities (< > & ' ") and failing that,
 > attempts to find the declaration of the entity in the DOCTYPE line or in an
 > external DTD.
 > 

It only throws an exception if my document does not contain a

<!DOCTYPE ...>

element. The parser is quite happy not to resolve the entities as long
as there is a DOCTYPE element in the document and external DTD loading
is turned off.

Why is this a problem? My employer is using Arbortext's Epic editor to
edit software manuals conforming to the Docbook dtd. Our manuals
consist of a book file (BOOK_TITLE_book.xml) that lists all the
chapters of the book named BOOK_TITLE as external entities. The book
file contains the DOCTYPE declaration for the book. The chapter files
are xml fragments of a larger XML document and thus have no DOCTYPE
declaration. (Actually, there is a declaration but it is commented
out. The commented out declaration is used by Epic when a user opens a
chapter file to determine its doctype.)

I am developing an independent Java application that needs to extract 
information from our software manuals. My Java app uses Xerces to
parse the manuals, including the chapter files. However, I have
discovered that Xerces won't parse any chapter file that contains
character entities that are not one of the five builtin XML entities.

I have found that I can work around the problem by inserting a dummy
DOCTYPE declaration in the chapter files. Epic seems to ignore the
dummy declaration and the declaration makes Xerces 2.2.0 happy as
long as I turn off external DTD loading.

My concern is that this workaround seems to depend on possible bugs in Xerces
and/or in Epic that may someday be fixed, thereby breaking my application.

- Paul

 > It doesn't surprise me then that your parser dies if you prevent it from
 > finding the entity declaration it needs to continue parsing.
 > 
 > Please correct me if I'm wrong.
 > 
 > Cheers!
 > Brion Swanson
 > 
 > -----Original Message-----
 > From: Paul Kinnucan [mailto:[EMAIL PROTECTED]
 > Sent: Wednesday, October 23, 2002 4:40 PM
 > To: [EMAIL PROTECTED]
 > Subject: Entity resolution problem
 > 
 > 
 > Hi,
 > 
 > Why does Xerces 2.2 throw an exception and quit when
 > it encounters an entity reference (e.g., &reg;) even
 > though I have specified 
 > 
 > parser.setFeature("http://apache.org/xml/features/nonvalidating/load-externa
 > l-dtd", false);
 > 
 > The parser throws the exception
 > 
 > org.xml.sax.SAXParseException: The entity "reg" was referenced, but not
 > declared.
 > 
 > If I put a DOCTYPE declaration at the head of the file, Xerces
 > parses the file without any problem.
 > 
 > - Paul
 > 
 > 
 > 
 > 
 > ---------------------------------------------------------------------
 > To unsubscribe, e-mail: [EMAIL PROTECTED]
 > For additional commands, e-mail: [EMAIL PROTECTED]
 > 
 > ---------------------------------------------------------------------
 > To unsubscribe, e-mail: [EMAIL PROTECTED]
 > For additional commands, e-mail: [EMAIL PROTECTED]
 > 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to