Hi all,

after quite a while I'm reporting back here, because I faced a problem after 
updating to Tomcat 8.5.19: Suddenly, static text files (.txt, .js etc.) encoded 
with UTF-8 (without BOM) are getting corrupted when they are served to the 
browser. This didn't happen with Tomcat 8.5.16.

To reproduce (I'm using Windows 10 Creators Update with Java 1.8.0_141):

1) Download apache-tomcat-8.5.19-windows-x64.zip and extract it
2) Open Notepad++ [1] and paste the text "Aß" (without quotes) in a new text 
file. In the Encoding menu, select "UTF-8 without BOM" (if not already 
selected) and then save the textfile in the Tomcat directory to 
"webapps/ROOT/test.txt".
3) Verify with a hex editor that the text file contains the following 3 bytes: 
0x41 0xC3 0x9F
4) Now use a browser or some other download tool to make a request to 
"http://localhost:8080/test.txt"; and save the text file.
5) Open the file with a hex editor and notice that the last byte has changed: 
0x41 0xC3 0x3F
This means UTF-8 decoding will fail as the last byte does not have set the 
highest bit any more.

In my case, this problem caused string from (UTF-8) JavaScript files being 
displayed incorrectly in the browser.

If you do the same with Tomcat 8.5.16, you can see that the text file is served 
correctly.
(Additionally, I found that Tomcat 8.5.19 uses "Transfer-Encoding: chunked" to 
serve the file, instead of using a "Content-Length: 3" header as Tomcat 8.5.16.)

Why would Tomcat want to modify static files, instead of just serving them 
as-is?

Note: Bisecting shows that the problem seems to have been introduced with 
r1800455 [2].

Thanks!


Regards,
Konstantin Preißer

[1] https://notepad-plus-plus.org/
[2] https://svn.apache.org/viewvc?view=revision&revision=r1800455



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to