On Fri, May 18, 2001 at 12:40:04PM -0700, Forrest R. Girouard wrote:
> 
> It is my understanding that '8859_1' is an alias for a Java encoding 
> which maps to the 'ISO-8859-1' character set.  The Java encoding and
> the character set name are not always the same.
> 
> Furthermore, while it's not readily apparent using 'ISO8859_1' for
> the Java encoding is far preferable to using '8859_1' (or anything 
> else) under Java 2.  
> 
> Look at the private getBTCConverter() method in the String.java source
> and note the use of the following:
> 
>       !encoding.equals(btc.getCharacterEncoding())
> 
> The ByteToCharConverter instance for ISO-8859-1 always returns 'ISO8859_1'
> for the getCharacterEncoding() method and this means that while other
> names may work the ThreadLocal caching will be subverted.  Since the
> ByteToCharConverter.getConverter() method involves synchronization it
> is not a good thing to subvert the ThreadLocal cache.

Thanks for pointing this out. AFAICS, the use of 'iso-8859-1' instead of
'8859_1' (my patch) does not make this situation any better or worse in the
tomcat code. <g>

The tomcat 3.x code doesn't look like it takes this into account at all. I
wonder if looking up the Java Encoding name associated with the encoding
name supplied by user-agents etc. is an optimisation worth making. I'll look
into that.



Vince.

Reply via email to