Hi Bob,

I'm not sure what Xerces 1.4.2 was doing but Xerces2 has it's own
specialized UTF-8 reader which is used instead of the one provided by
Java. It will throw an exception when it encounters malformed UTF-8 byte
sequences.

On Fri, 20 Feb 2004, Bob Foster wrote:

> Encoding detection happens when the document is opened; after that, a
> conversion error may have caused a well-formed error, but it cannot be
> identified as a charset problem.
>
> Most likely the parser isn't detecting the non-UTF-8 characters because
> Java isn't. I have seen mention that you can ask Java's encoding
> converters to throw if they encounter invalid character sequences? Does
> anyone know if this is true? And if so, why doesn't Xerces do it?
>
> Bob
>
> [EMAIL PROTECTED] wrote:
> > Maybe because the bad character is in the comment. I suspect the parser
> > skips everything until the closing comment tag. What happens when the bad
> > character is in an attribute value for example?
> >
> > Ringo
> >
> > -----Original Message-----
> > From: Berchner Matthias ICM Berlin
> > [mailto:[EMAIL PROTECTED]
> > Sent: vrijdag 20 februari 2004 15:15
> > To: '[EMAIL PROTECTED]'
> > Subject: UTF-8 encoding errors are not always detected
> >
> >
> > Hi,
> >
> > I'm using Xerces 1.4.2, unfortunally  UTF-8 coding errors are not always
> > detected:
> >
> > Example:
> >
> > --------------------------------------------
> > <?xml version="1.0" encoding="UTF-8"?>
> > <Project>
> >     <!-- f�r ONC -->
> > </Project>
> > --------------------------------------------
> >
> > <!-- f�r ONC --> correponds to
> >     hex 3C 21 2D 2D 20 66 FC 72 20 4F 4E 43 20 2D 2D 3E
> >
> > Non-UTF-8 character: � <-> FC
> >
> >
> > Kind Regards,
> > Matthias
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to