Quoting Wendy Smoak <[EMAIL PROTECTED]>:

> A long time ago, Craig McClanahan wrote:
> > It is a common misconception that the public identifiers of a DTD like
> > this *must* actually be working URLs [...].
> > They are just unique strings of characters that
> > (often) happen to look like URLs.  
> > Blame the XML community for that :-).
> 
> 
> And then Yuan Saul asked:
> > If a local copy of DTD is not available, then an Internet connection
> is
> > required, in this case, does the URI has to be pointing to a working
> URL
> > where the DTD file can be retrieved?
> 
> Which is also my question, but there is no reply in the archives.
> Anyone?
> 
> Today we got a note from campus IT saying that they believed some
> problems in their J2EE apps were related to "code that connects to
> http://java.sun.com behind-the-scenes to download various DTD files
> related to parsing XML documents."
> 
> In addition to whether it happens at all (going out to the internet to
> retrieve the DTD) I'm also curious if it's the XML parser, or the
> Servlet container, etc.  What component would make the call out to get
> the DTD?
> 
> I've always wondered...
> 

Since Wendy spends quite a bit of time answering questions for users, it's only
fair that I answer this one for her :-).

The answer actually depends on your XML parser, and you can actually get
involved in the process if you want to, but for simple use cases the answer is
"yes".  If you're interested, here's a few details about how Struts (and
Tomcat, for that matter) use the commons-digester module to parse configuration
files:

* Your XML document includes a DOCTYPE header defining the DTD.  For a
  Struts config file, it would look like:

  <!DOCTYPE struts-config PUBLIC
   "-//Apache Software Foundation//DTD Struts Configuration 1.1//EN"
   "http://jakarta.apache.org/struts/dtds/struts-config_1_1.dtd";>

* This header includes two identifiers for the DTD ... the *public* identifer
  "-//Apache ..." and the *system* identifier (in this case, a URL).

* Commons Digester uses the SAX parsing APIs provided by the parser.
  Included in these APIs is an interface, which Digester implements.

* The parser calls the resolveEntity() method of the EntityResolver
  (i.e. the Digester instance), asking it to return an InputSource
  so the parser can read the DTD's contents.

* The default EntityResolver defined by SAX simply uses the system id
  as a URL and attempts to retrieve it.  With the system identifier
  given above, it will go to the jakarta.apache.org site across the
  Internet.

* Digester, however, is "smarter than the average bear" (if you remember
  Yogi Bear from growing up days :-).  It allows you to register a
  mapping from a public identifier ("-//Apache ...") to an *internal*
  resource inside the JAR file.  If you call resolveEntity() and pass
  one of the registered public ids, it will ignore the system id and
  return a stream to the internal resource.  If the public id is not
  recognized, it wil do the usual thing (using the system id instead).

* Struts pre-registers the public ids for the various versions of the
  DTD, pointing at internal resources (see the initConfigDigester() method
  of ActionServlet), so that it will never need to use the system id.

In this way, you can run Struts based applications (and Tomcat, which does the
same thing for the DTDs for web.xml files) completely disconnected from the
Internet, without changing the system ids in your XML documents.

If you find that your application is attempting to go to the Internet for DTDs
anyway, the most likely explanation is that you have a typo in the public
dentifier in your config fie.

> -- 
> Wendy Smoak
> Application Systems Analyst, Sr.
> ASU IA Information Resources Management 
> 

Craig


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to