Hi,

        When Java reads a stream of bytes into characters and encounters a 
character outside of the encoding (e.g. not in the ISO-8859-1 character set) it 
replaces the character with a '?'. I believe this behaviour is configurable, 
but I don't know how (you might have to register your own converter). By the 
time Xerces (or Xalan) sees the character, it's too late. I'm not sure where 
you configure it, but looking at the source code, it's a 'substitution mode' 
flag - there are methods on CharToByteConverter (and ByteToCharConverter if 
you're going the other way) to set it, but I'm not sure how you can set it in 
your case. If you set it to 'false', the converter will throw an exception if 
it encounters an unmappable byte sequence (or charater).

        Chris

Reply via email to