On Fri, May 18, 2001 at 12:40:04PM -0700, Forrest R. Girouard wrote:
>
> It is my understanding that '8859_1' is an alias for a Java encoding
> which maps to the 'ISO-8859-1' character set. The Java encoding and
> the character set name are not always the same.
>
> Furthermore, while it's not readily apparent using 'ISO8859_1' for
> the Java encoding is far preferable to using '8859_1' (or anything
> else) under Java 2.
>
> Look at the private getBTCConverter() method in the String.java source
> and note the use of the following:
>
> !encoding.equals(btc.getCharacterEncoding())
>
> The ByteToCharConverter instance for ISO-8859-1 always returns 'ISO8859_1'
> for the getCharacterEncoding() method and this means that while other
> names may work the ThreadLocal caching will be subverted. Since the
> ByteToCharConverter.getConverter() method involves synchronization it
> is not a good thing to subvert the ThreadLocal cache.
Thanks for pointing this out. AFAICS, the use of 'iso-8859-1' instead of
'8859_1' (my patch) does not make this situation any better or worse in the
tomcat code. <g>
The tomcat 3.x code doesn't look like it takes this into account at all. I
wonder if looking up the Java Encoding name associated with the encoding
name supplied by user-agents etc. is an optimisation worth making. I'll look
into that.
Vince.