Re: ambigous parsing behaviour/whitespace prob

neilg Mon, 07 Jan 2002 12:17:48 -0800

Hello Olaf,

Firstly, you probably want to post questions like this to xerces-j-user;
you're much more likely to be answered there.  (Besides, this list is
intended for developers of xerces, requests for enhancement etc., rather
than for folks who develop with the product).


At any rate:  Your servlet environment is playing tricks on you.  Your
standalone code is using Xerces2, whereas your servlet code is using
Xerces1.  So it looks like your servlet comes with an older version of
Xerces, and you're picking that up in your testing.

Cheers,
Neil

Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  [EMAIL PROTECTED]



"Olaf Kittelmann" <[EMAIL PROTECTED]> on 01/07/2002 12:09:40 PM

Please respond to [EMAIL PROTECTED]

To:   [EMAIL PROTECTED]
cc:
Subject:  ambigous parsing behaviour/whitespace prob


Hi Everybody,
I do have a problem with Xerces, parsing my own ML and Whitespace.
I am trying to read init Data for a complex Object structure from XML
parsing with Xerces.
I do have a class "structureParser" using a XMLReader and an inner class as
contenthandler.
In a static initializer I specify org.apache.xerces.parsers.SAXParser as my
SAXdriver, set up my debugging and set validation to false.

static {

System.setProperty
("org.xml.sax.driver","org.apache.xerces.parsers.SAXParser
");


System.setProperty("debug","false");


String strDebug = System.getProperty("DEBUG");

if (strDebug == null)

strDebug = System.getProperty("debug");

if (strDebug != null && strDebug.equalsIgnoreCase("true"))

debug = true;

else

debug = false;

}

I wrote a main method for testing that takes the path to my XML file as
argument pass it to my XMLReader and parse.

everything works fine, characters is called when there is characters and
ignorable whitespace is called when there are none.

the message stack on debugging looks like this:

de.elmedia.StructureParser$AbmlHandler.ignorableWhitespace(char[], int,
int)
line: 415
org.apache.xerces.parsers.SAXParser(org.apache.xerces.parsers.AbstractSAXPar

ser).ignorableWhitespace(org.apache.xerces.xni.XMLString) line: 404
org.apache.xerces.impl.xs.XMLSchemaValidator.ignorableWhitespace(org.apache.

xerces.xni.XMLString) line: 479
org.apache.xerces.impl.XMLNamespaceBinder.ignorableWhitespace(org.apache.xer

ces.xni.XMLString) line: 612
org.apache.xerces.impl.dtd.XMLDTDValidator.characters(org.apache.xerces.xni.

XMLString) line: 836
org.apache.xerces.impl.XMLDocumentScannerImpl(org.apache.xerces.impl.XMLDocu

mentFragmentScannerImpl).scanContent() line: 836
org.apache.xerces.impl.XMLDocumentScannerImpl$ContentDispatcher(org.apache.x

erces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatc

h(boolean) line: 1379
org.apache.xerces.impl.XMLDocumentScannerImpl(org.apache.xerces.impl.XMLDocu

mentFragmentScannerImpl).scanDocument(boolean) line: 328
org.apache.xerces.parsers.DTDXSParserConfiguration(org.apache.xerces.parsers

.StandardParserConfiguration).parse(boolean) line: 479
org.apache.xerces.parsers.DTDXSParserConfiguration(org.apache.xerces.parsers

.StandardParserConfiguration).parse(org.apache.xerces.xni.parser.XMLInputSou

rce) line: 521
org.apache.xerces.parsers.SAXParser(org.apache.xerces.parsers.XMLParser).par

se(org.apache.xerces.xni.parser.XMLInputSource) line: 148
org.apache.xerces.parsers.SAXParser(org.apache.xerces.parsers.AbstractSAXPar

ser).parse(org.xml.sax.InputSource) line: 972


Now for my real purpose I am using a servlet that does pretty much the same
thing. it creates a Structureparser, the static initializer is executed and
it passes the same XML document:

to StructureParser and the XMLReader is set not to validate. but now, The
sax parser only triggers the character() method, with Strings that look
like
"| ".

Now, I can still trim the strings and only process the ones that really
contain characters. but the thing I am interested in is: why the heck does
Xerces show this different behaviour when I use exactly the same steps to
set it up?

the message stack this time looks like:

de.elmedia.StructureParser$AbmlHandler.characters(char[], int, int) line:
87
org.apache.xerces.parsers.SAXParser.characters(char[], int, int) line: 1574
org.apache.xerces.validators.common.XMLValidator.processWhitespace(char[],
int, int) line: 654
org.apache.xerces.readers.UTF8Reader.scanContent(org.apache.xerces.utils.QNa

me) line: 2246
org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(bo

olean) line: 1145
org.apache.xerces.framework.XMLDocumentScanner.parseSome(boolean) line: 380
org.apache.xerces.parsers.SAXParser(org.apache.xerces.framework.XMLParser).p

arse(org.xml.sax.InputSource) line: 908



so how can this be? why are the classes from the .framework package used
for
the servlet, and the .implementation ones for the application.?


my XML source looks like this (nothing fancy).

<?xml version ="1.0"?>
<!DOCTYPE Struktur SYSTEM "Struktur.dtd">
<Struktur>
<Kategorie RootTemplate="Services.html">
<LinkObjekt ID="" pic="">Services</LinkObjekt>
<Kategorie RootTemplate="seach.html">
<LinkObjekt ID="" pic="">search</LinkObjekt>
</Kategorie>
<Kategorie RootTemplate="cart.html</Kategorie>

...........

.......
</Struktur>













---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: ambigous parsing behaviour/whitespace prob

Reply via email to