The byte input stream does not carry any encoding information, so the
XmlStreamReader has to guess what encoding was used.

I'm surprised that it ever worked reliably.

On Tue, 3 Oct 2023 at 09:13, Laurence Gonsalves <laure...@gonsalv.es> wrote:
>
> Hello,
>
> It looks like XmlStreamReader is not correctly handling several encodings
> in Commons IO 2.14.0 that previously worked in version 2.13.0.
>
> Here's a self-contained snippet (Kotlin) that demonstrates the problem:
>
>     val xml = "<?xml version='1.0' encoding='437'?><root>Ç</root>"
>
>     val stream = xml.byteInputStream(Charset.forName("437"))
>
>     val reader = XmlStreamReader.builder()
>         .setInputStream(stream)
>         .setLenient(false)
>         .get()
>
>     reader.readText() shouldBe xml
>
> With 2.13.0 this code works fine, but in 2.14.0 the "Ç" (C-cedilla) becomes
> a "�" (Unicode replacement character).
>
> We're seeing similar issues with all of the other code page encodings we've
> tried (850, 852, 855, 857, 860, 861, 862, 863, 865, and 866).

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org

Reply via email to