Hi Andy,

FWIW, I think you're analysis is entirely correct and I've been meaning to
look into this for the last couple of months--ever since I put down the
rewindableInputStream.  :-)  It's just that other things kept rising higher
on my priority queue, and for some reason I had thought this only got
manifested in xni callbacks.

So hopefully one of us will get to this soon!

Cheers,
Neil
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  [EMAIL PROTECTED]



"Andy Clark" <[EMAIL PROTECTED]> on 02/23/2002 09:04:11 PM

Please respond to [EMAIL PROTECTED]

To:   <[EMAIL PROTECTED]>
cc:
Subject:  RE: External ENTITY Reference Behavior with X1/X2/SAX


Anthony,

Yes, you can just attach a patch to a posting to the xerces-j-dev
mailing list. Some hints: use a standard UNIX-like diff program to make
the patch (e.g. the "cvs diff" command if you're working on an extracted
copy of the codebase from the CVS repository); and attach the resulting
diff file as an attachment. The last part makes applying the patch so
much easier.

I haven't had time to look into this but I have a guess as to where the
problem is. When an external parsed entity is scanned, it may start with
a TextDecl that specifies the encoding of that entity. But the parser
buffers chunks of the document for better performance which will read
past the TextDecl using a possibly incorrect encoding. So I used a one
character reader in the original Xerces2 betas that would force the
scanner to only read one character from the underlying input stream
until I changed the encoding OR until which point that I knew that there
wasn't a TextDecl and I could continue processing as normal.

Before the Xerces 2.0.0 release, a change to this code was entered by
Sandy?. The reason for the change was that the Java decoders cannot be
trusted to only read the bytes necessary for decoding a single
character. In other words, many of the decoders buffer the stream
internally. This reads too many bytes and we have the same problem all
over again.

So what is the problem? My guess is that even though we're now using a
kind of rewindable input stream to avoid the input stream problem, it's
never being switched back to regular buffering mode. Therefore, it
continues reading the entire entity one character at a time, resulting
in the excessive number of characters() callbacks.

I don't have proper CVS access at the moment, otherwise I'd look into it
(provided I have the time). But I would suggest looking in the
impl/XMLEntityManager.java code.

-AndyC

           -----Original Message-----
           From: Anthony W. Marino
           Sent: 2002/02/21 (木) 9:26
           To: [EMAIL PROTECTED]
           Cc:
           Subject: Re: External ENTITY Reference Behavior with X1/X2/SAX



           Andy,
           Does this need to be submitted to the X2 group?  Should I do
it...what are
           the steps?

           Thank You,
           Anthony


---------------------------------------------------------------------
           To unsubscribe, e-mail: [EMAIL PROTECTED]
           For additional commands, e-mail:
[EMAIL PROTECTED]



(See attached file: winmail.dat)
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Attachment: winmail.dat
Description: Binary data

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to