Re: Tomcat 8.5.19 corrupts static text files encoded with UTF-8

Mark Thomas Sun, 30 Jul 2017 03:40:20 -0700

On 30/07/17 10:50, Mark Thomas wrote:
> On 30/07/17 10:21, Rémy Maucherat wrote:
>> On Sun, Jul 30, 2017 at 10:59 AM, Konstantin Preißer <kpreis...@apache.org>
>> wrote:


<snip/>

>>> I honestly don't understand that change. As a web developer, I expect a
>>> web server to serve static files exactly as-is, without trying to convert
>>> the files into another charset and without trying to detect the charset of
>>> the file (unless the server is configured to do so).
> 
> Tomcat is trying to handle various edge cases. These include:
> 
> - Response encoding defined as one charset when serving static content
> that has a different charset (Tomcat used to send the static bytes as-is
> which could result in a broken response in some cases).
> 
> - Static content in one encoding included into a response encoding in a
> different encoding. Again, depending on circumstances, the included
> content would be broken.
> 
>> It probably still does too much right now. Mark made a very complex change,
>> but there's encoding conversion in too many cases maybe. I think there
>> should be conversion only when a writer is used by the default servlet, but
>> we should let the user deal with the other cases.
>>
>> Right now, the code does its conversion when the resource is a text mime
>> type and its encoding doesn't match (which may be accurate, or not, it
>> seems), and in that case it's very broad and the behavior should be
>> optional (off by default IMO). Besides, it's going to perform much worse
>> all of a sudden.
> 
> I agree that the change is complex. I also agree that the conversion
> appears to be kicking in more often than expected.
> 
> I thought we had resolved most of the issues working through the
> problems reported by George Stanchev and that 8.5.19 was unlikely to
> cause further issues.
> 
> I think the key to fixing this is limiting when the conversion is applied.

<snip/>

>>> Further, as an system administrator, I would expect that I can update
>>> Tomcat from x.y.z to x.y.(z+n), without static JavaScript files suddenly
>>> getting broken (which isn't immediately obvious as mostly the script per se
>>> will work, only that some special string characters outside of ASCII are
>>> displayed incorrectly to the user).
>>> Shouldn't such behavior changes be reserved for the next major/minor
>>> version which is not yet stable, in this case Tomcat 9.0.0?
> 
> Stuff breaking is unintentional and is a bug. Unfortunately, it appears
> that you have stumbled across a bug that wasn't detected in any of the
> last three attempted releases.
> 
> I think (but I can't be sure without a test case) the problem stems from
> the case where a character set is not explicitly defined for the
> response. If that is the case, it should be a fairly simple fix.
> 
> My preference is to keep the edge case handling I recently added if at
> all possible and prevent the conversion from applying when it is not
> required.

Konstantin,

If you can try one of the following patches and report back whether it
fixes the problem that would be very helpful.

Tomcat 9.0.x
http://home.apache.org/~markt/patches/2017-07-30-default-servlet-encoding-tc9-v1.patch

Tomcat 8.5.x
http://home.apache.org/~markt/patches/2017-07-30-default-servlet-encoding-tc85-v1.patch

Remy,

The patch above should significantly reduce the frequency that
conversion is applied, limiting it to the case where an encoding has
been explicitly defined and the fileEncoding attribute of the
DefaultServlet is configured differently or when including since we
always need to remove any BOM in that case.

Is that sufficient or would you still like to see an attemptConversion
attribute added to the DefaultServlet?

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Tomcat 8.5.19 corrupts static text files encoded with UTF-8

Reply via email to