Hi Bob, I'm not sure what Xerces 1.4.2 was doing but Xerces2 has it's own specialized UTF-8 reader which is used instead of the one provided by Java. It will throw an exception when it encounters malformed UTF-8 byte sequences.
On Fri, 20 Feb 2004, Bob Foster wrote: > Encoding detection happens when the document is opened; after that, a > conversion error may have caused a well-formed error, but it cannot be > identified as a charset problem. > > Most likely the parser isn't detecting the non-UTF-8 characters because > Java isn't. I have seen mention that you can ask Java's encoding > converters to throw if they encounter invalid character sequences? Does > anyone know if this is true? And if so, why doesn't Xerces do it? > > Bob > > [EMAIL PROTECTED] wrote: > > Maybe because the bad character is in the comment. I suspect the parser > > skips everything until the closing comment tag. What happens when the bad > > character is in an attribute value for example? > > > > Ringo > > > > -----Original Message----- > > From: Berchner Matthias ICM Berlin > > [mailto:[EMAIL PROTECTED] > > Sent: vrijdag 20 februari 2004 15:15 > > To: '[EMAIL PROTECTED]' > > Subject: UTF-8 encoding errors are not always detected > > > > > > Hi, > > > > I'm using Xerces 1.4.2, unfortunally UTF-8 coding errors are not always > > detected: > > > > Example: > > > > -------------------------------------------- > > <?xml version="1.0" encoding="UTF-8"?> > > <Project> > > <!-- f�r ONC --> > > </Project> > > -------------------------------------------- > > > > <!-- f�r ONC --> correponds to > > hex 3C 21 2D 2D 20 66 FC 72 20 4F 4E 43 20 2D 2D 3E > > > > Non-UTF-8 character: � <-> FC > > > > > > Kind Regards, > > Matthias > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------- Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: [EMAIL PROTECTED] E-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
