Just had another look at the class: in 2.13, the regex for matching the encoding string was Pattern.compile("<\\?xml.*encoding[\\s]*=[\\s]*((?:\".[^\"]*\")|(?:'.[^']*'))", Pattern.MULTILINE);
In 2.14, the pattern includes the following matching for the encoding: "encoding\\s*=\\s*((?:\"[A-Za-z]([A-Za-z0-9\\._]|-)*\")|(?:'[A-Za-z]([A-Za-z0-9\\\\._]|-)*'))", This does not allow for an encoding that starts with a digit; i.e. it won't match encoding='437' AFAICT, no supported encodings start with a digit. The '437' encoding is actually known as 'Cp437': https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html https://docs.oracle.com/en/java/javase/17/intl/supported-encodings.html Try using 'Cp437' as the encoding. On Tue, 3 Oct 2023 at 20:01, sebb <seb...@gmail.com> wrote: > > On Tue, 3 Oct 2023 at 18:05, Laurence Gonsalves > <laure...@xenomachina.com> wrote: > > > > On Tue, Oct 3, 2023 at 1:39 AM sebb <seb...@gmail.com> wrote: > > > > > > The byte input stream does not carry any encoding information, so the > > > XmlStreamReader has to guess what encoding was used. > > > > Determining what encoding to use when reading XML from a byte stream > > is the purpose of XmlStreamReader. From its documentation: "Character > > stream that handles all the necessary Voodoo to figure out the charset > > encoding of the XML document within the stream." > > > > What it's supposed to do in this case is use the "encoding='437'" from > > the input to determine that the Charset to use when decoding the byte > > stream is "437" (aka "code page 437"). > > Sorry, I completely overlooked that. > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@commons.apache.org > > For additional commands, e-mail: user-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org