Although the pages I linked don't mention them, it turns out that
there is actually an alias '437', also many other numeric ones.

Indeed there are other aliases that start with a letter but otherwise
don't match the RE.
e.g. ISO_8859-1:1987
So it seems the updated RE is indeed too restrictive.

Sorry for the confusion.

On Tue, 3 Oct 2023 at 20:22, sebb <seb...@gmail.com> wrote:
>
> Just had another look at the class: in 2.13, the regex for matching
> the encoding string was
> Pattern.compile("<\\?xml.*encoding[\\s]*=[\\s]*((?:\".[^\"]*\")|(?:'.[^']*'))",
> Pattern.MULTILINE);
>
> In 2.14, the pattern includes the following matching for the encoding:
> "encoding\\s*=\\s*((?:\"[A-Za-z]([A-Za-z0-9\\._]|-)*\")|(?:'[A-Za-z]([A-Za-z0-9\\\\._]|-)*'))",
>
> This does not allow for an encoding that starts with a digit; i.e. it
> won't match encoding='437'
>
> AFAICT, no supported encodings start with a digit.
>
> The '437' encoding is actually known as 'Cp437':
> https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html
> https://docs.oracle.com/en/java/javase/17/intl/supported-encodings.html
>
> Try using 'Cp437' as the encoding.
>
> On Tue, 3 Oct 2023 at 20:01, sebb <seb...@gmail.com> wrote:
> >
> > On Tue, 3 Oct 2023 at 18:05, Laurence Gonsalves
> > <laure...@xenomachina.com> wrote:
> > >
> > > On Tue, Oct 3, 2023 at 1:39 AM sebb <seb...@gmail.com> wrote:
> > > >
> > > > The byte input stream does not carry any encoding information, so the
> > > > XmlStreamReader has to guess what encoding was used.
> > >
> > > Determining what encoding to use when reading XML from a byte stream
> > > is the purpose of XmlStreamReader. From its documentation: "Character
> > > stream that handles all the necessary Voodoo to figure out the charset
> > > encoding of the XML document within the stream."
> > >
> > > What it's supposed to do in this case is use the "encoding='437'" from
> > > the input to determine that the Charset to use when decoding the byte
> > > stream is "437" (aka "code page 437").
> >
> > Sorry, I completely overlooked that.
> >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
> > > For additional commands, e-mail: user-h...@commons.apache.org
> > >

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org

Reply via email to