Title: RE: DTD Reading Urgent

Glenn,

Thanks for that pointer - I'll give it a bash.

The only problem is the same as with the standard SAX callbacks - the start entity tends not to come at the start of the entity reference, but at the start of the declaration which contains it.

What I'm really looking for is a way to get the unresolved string sent in the parameters for each callback.

However, this could certainly save some work.

Cheers

John

-----Original Message-----
From: Glenn Marcy [mailto:[EMAIL PROTECTED]]
Sent: 06 August 2001 21:40
To: [email protected]
Cc: [EMAIL PROTECTED]
Subject: RE: DTD Reading Urgent



John,

Actually, it is not as bad as you make out...

Here is a note that I sent to the dev list several months ago:

Subject: Re: [xerces2] Note to current XNI state

I have been looking into another approach.  The documentation of the
SAX2 property "http://xml.org/sax/properties/xml-string" says:

     data type: java.lang.String
     description: The literal string of characters that was the source
                  for the current event.
     access: read-only

I have been looking at the current Xerces2 APIs and how one might add
support for this property during DTD parsing.

So, using your two examples, I get (from a DTD writer prototype):

Input:
<!ENTITY % ent "ANY">
<!ELEMENT e %ent;>

Output:
<!ENTITY % ent "ANY">
<!ELEMENT e %ent;>

Input:
<!ENTITY % ent " ">
%ent;
<!ELEMENT e ANY>

Output:
<!ENTITY % ent " ">
%ent;
<!ELEMENT e ANY>

So this matches one of my goals of being able to take any DTD and parse
it and be able to spit back out exactly what was parsed.  In this case
all I do is get the value of the "xml-string" property during every event
and write it out.  Now to get a feel for the event/xml-string relationship,
I can run the program with debugging turned on, which just adds an
"[event-name]" to the output stream.  Note that I have also added a new
"startOfMarkupDecl" event because it made my application (the simple DTD
writer) easier to code, but it is not "strictly" necessary.  The debug
code looks like:

String propString = "http://xml.org/sax/properties/xml-string";

void printEvent(String eventName) {
    String xmlstring = (String) parser.getProperty(propString);
    if (DEBUG)
        System.out.println("[" + eventName "]" + xmlstring);
    else
        System.out.print(xmlstring);
}

So again for the same to cases:

Output (with DEBUG printing added):
[startDTD]
[startEntity "[dtd]"]
[startOfMarkupDecl]<!ENTITY
[internalEntityDecl] % ent "ANY">
[startOfMarkupDecl]
<!ELEMENT
[startEntity "%ent"] e %ent;
[endEntity "%ent"]
[elementDecl]>
[endEntity "[dtd]"]
[endDTD]

Output (with DEBUG printing added):
[startDTD]
[startEntity "[dtd]"]
[startOfMarkupDecl]<!ENTITY
[internalEntityDecl] % ent " ">
[startEntity "%ent"]
%ent;
[endEntity "%ent"]
[startOfMarkupDecl]
<!ELEMENT
[elementDecl] e ANY>
[endEntity "[dtd]"]
[endDTD]

Obviously this requires some additional "reparsing" of some of the
simple constructs in the declarations, but it isn't too hard to keep
straight.  The nice advantage is that you do not need to add lots of
methods to the handler APIs and you can avoid doing the work of
creating the String until you get the getProperty call from the
application within the handler callback.  Since the parser only needs
to be able to return the unparsed stream back as far as the last event,
and not since the last call to getProperty, the amount of information
that needs to be available is small.  There is a little overhead on the
edge cases at low-level reader I/O buffer boundaries, but since most
events will occur within the same buffer as the previous event a simple
lastXMLStringOffset variable handles the common case.

Regards,
Glenn

<<<end enclosure>>>

John wrote:

For example, what would you like reported for this:
<!ENTITY % someOtherFile SYSTEM "aardvark.mod">
<!ENTITY % prefix "foo:">
<!ENTITY % elementName "%prefix;bar">
<!ENTITY % cmBit "a,b,c">
<!ENTITY % fullDecl "<!ELEMENT %elementName; (%cmBit;)>" >
%fullDecl;
%someOtherFile;

This is what I get with my DTDWriter:

[startDocument]
[startDTD]
[startEntity "[dtd]"]
[startOfEntityDecl]<!ENTITY
[externalEntityDecl] % someOtherFile SYSTEM "aardvark.mod">
[startOfEntityDecl]
<!ENTITY
[internalEntityDecl] % prefix "foo:">
[startOfEntityDecl]
<!ENTITY
[startEntity "%prefix"] % elementName "%prefix;
[endEntity "%prefix"]
[internalEntityDecl]bar">
[startOfEntityDecl]
<!ENTITY
[internalEntityDecl] % cmBit "a,b,c">
[startOfEntityDecl]
<!ENTITY
[startEntity "%elementName"] % fullDecl "<!ELEMENT %elementName;
[endEntity "%elementName"]
[startEntity "%cmBit"] (%cmBit;
[endEntity "%cmBit"]
[internalEntityDecl])>" >
[startEntity "%fullDecl"]
%fullDecl;
[startOfElementDecl]
[elementDecl]
[endEntity "%fullDecl"]
[startEntity "%someOtherFile"]
%someOtherFile;
[endEntity "%someOtherFile"]
[endEntity "[dtd]"]
[endDTD]

Regards,
Glenn




                                                                                                                  
                    "Anderson,                                                                                    
                    John"                To:     "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>, 
                    <[EMAIL PROTECTED]        "'[email protected]'" <[email protected]>           
                    oft.com>             cc:                                                                      
                                         Subject:     RE: DTD Reading Urgent                                      
                    08/06/2001                                                                                    
                    01:09 PM                                                                                      
                    Please respond                                                                                
                    to                                                                                            
                    xerces-j-dev                                                                                  
                                                                                                                  
                                                                                                                  



Ho ho!


Found someone who wants this as well!


The XML spec says that resolving is what a validating parser should do, and
neither SAX nor DOM APIs expose this information. I would also like to be
able to access this information, but most people have put it in the too
hard basket. The problem is complicated (to put it mildly) by the various
ways in which PEs can be used, including as element names, content models
(or parts thereof), entire declarations, includes, bits of attribute
declarations, etc etc. Plus some local PEs to override those in the
external DTD. I am no kind of parser expert, but I guess it would be pretty
difficult for a parser to not resolve this and still guarantee validity
without doing a double pass.


For example, what would you like reported for this:


<!ENTITY % someOtherFile SYSTEM "aardvark.mod">
<!ENTITY % prefix "foo:">
<!ENTITY % elementName "%prefix;bar">
<!ENTITY % cmBit "a,b,c">
<!ENTITY % fullDecl "<!ELEMENT %elementName; (%cmBit;)>" >
%fullDecl;
%someOtherFile;


Now add in a few general entities just to make it interesting, and perhaps
some attributes and then remember it all when you create and modify a DOM3
AS Model of it.


As far as I know, the only way to do it would be to do some hacking in the
source code after determining which entities you want resolved and which
ones you'd like passed through. Unfortunately I haven't yet had the time to
work out exactly how I might do this. If I do, I'll let you know.


If anyone else has done it, I'd also be very grateful to know where and how
I should begin.


John


-----Original Message-----
From: chandru [mailto:[EMAIL PROTECTED]]
Sent: 25 July 2001 05:40
To: [EMAIL PROTECTED]
Subject: DTD Reading Urgent







Hi friends ,
  while reading the dtd using the DTDReader(using the feature
decl-handler,of parser) .the elements content model is giving the a
normalised definition .The model will be normalized so that all parameter
entities are fully resolved and all whitespace is removed,and will include
the enclosing parentheses. how can i stop this,i want the un normalised
definition of the element .
e.g:
 <!ELEMENT NOE (%_NOE_;)>
<!ENTITY % _NOE_ (Msgfun,getInfo,(Reader))>


what iam expecting is:
  element name : NOE
  content Mode: (%_NOE)


what the parser informs in DeclHandler  is
 element name : NOE
  content Mode: (Msgfun,getInfo,(Reader))


how can i regain the original declarations.i.e with out the normalisation.


Expecting your mail soon....





from
chandra sekhar








---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to