Hi,
When Java reads a stream of bytes into characters and encounters a
character outside of the encoding (e.g. not in the ISO-8859-1 character set) it
replaces the character with a '?'. I believe this behaviour is configurable,
but I don't know how (you might have to register your own converter). By the
time Xerces (or Xalan) sees the character, it's too late. I'm not sure where
you configure it, but looking at the source code, it's a 'substitution mode'
flag - there are methods on CharToByteConverter (and ByteToCharConverter if
you're going the other way) to set it, but I'm not sure how you can set it in
your case. If you set it to 'false', the converter will throw an exception if
it encounters an unmappable byte sequence (or charater).
Chris