The byte input stream does not carry any encoding information, so the XmlStreamReader has to guess what encoding was used.
I'm surprised that it ever worked reliably. On Tue, 3 Oct 2023 at 09:13, Laurence Gonsalves <laure...@gonsalv.es> wrote: > > Hello, > > It looks like XmlStreamReader is not correctly handling several encodings > in Commons IO 2.14.0 that previously worked in version 2.13.0. > > Here's a self-contained snippet (Kotlin) that demonstrates the problem: > > val xml = "<?xml version='1.0' encoding='437'?><root>Ç</root>" > > val stream = xml.byteInputStream(Charset.forName("437")) > > val reader = XmlStreamReader.builder() > .setInputStream(stream) > .setLenient(false) > .get() > > reader.readText() shouldBe xml > > With 2.13.0 this code works fine, but in 2.14.0 the "Ç" (C-cedilla) becomes > a "�" (Unicode replacement character). > > We're seeing similar issues with all of the other code page encodings we've > tried (850, 852, 855, 857, 860, 861, 862, 863, 865, and 866). --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org