Hi all, after quite a while I'm reporting back here, because I faced a problem after updating to Tomcat 8.5.19: Suddenly, static text files (.txt, .js etc.) encoded with UTF-8 (without BOM) are getting corrupted when they are served to the browser. This didn't happen with Tomcat 8.5.16.
To reproduce (I'm using Windows 10 Creators Update with Java 1.8.0_141): 1) Download apache-tomcat-8.5.19-windows-x64.zip and extract it 2) Open Notepad++ [1] and paste the text "Aß" (without quotes) in a new text file. In the Encoding menu, select "UTF-8 without BOM" (if not already selected) and then save the textfile in the Tomcat directory to "webapps/ROOT/test.txt". 3) Verify with a hex editor that the text file contains the following 3 bytes: 0x41 0xC3 0x9F 4) Now use a browser or some other download tool to make a request to "http://localhost:8080/test.txt" and save the text file. 5) Open the file with a hex editor and notice that the last byte has changed: 0x41 0xC3 0x3F This means UTF-8 decoding will fail as the last byte does not have set the highest bit any more. In my case, this problem caused string from (UTF-8) JavaScript files being displayed incorrectly in the browser. If you do the same with Tomcat 8.5.16, you can see that the text file is served correctly. (Additionally, I found that Tomcat 8.5.19 uses "Transfer-Encoding: chunked" to serve the file, instead of using a "Content-Length: 3" header as Tomcat 8.5.16.) Why would Tomcat want to modify static files, instead of just serving them as-is? Note: Bisecting shows that the problem seems to have been introduced with r1800455 [2]. Thanks! Regards, Konstantin Preißer [1] https://notepad-plus-plus.org/ [2] https://svn.apache.org/viewvc?view=revision&revision=r1800455 --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
