-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 André,
On 4/7/2011 12:26 PM, André Warnier wrote: > What I am saying is that, since you have the same Tomcat version on both > systems, the code which works differently is unlikely to be in Tomcat > itself. To my recollection (maybe wrong), Tomcat 6.0 does not include > any code that can deal with a multi-part POST. > (I think that Tomcat 7.0 does). Correct: Tomcat 6 does not include any multipart-parsing code, while Tomcat 7 does, since it implements the Servlet 3.0 Multipart upload features. Note that the URIEncoding setting on your <Connector> is not relevant, since the filename is being read from the request /body/ and not from the URI. I would use Fiddler(2?), LiveHttHeaders, FireBug, etc. to see if there is a difference on the /client/ side between these two situations. If the client is sending a different Content-Type, then things may go wrong. Here's the problem.... the ARPA Internet Text Messages standard (from which HTTP et al descend) doesn't say how to encode message headers that include non-US-ASCII characters. This includes filenames with non-US-ASCII characters that are embedded in the The W3C says this (http://www.w3.org/TR/html401/interact/forms.html) in section 17.13.4: " The user agent should attempt to supply a file name for each submitted file. The file name may be specified with the "filename" parameter of the 'Content-Disposition: form-data' header, or, in the case of multiple files, in a 'Content-Disposition: file' header of the subpart. If the file name of the client's operating system is not in US-ASCII, the file name might be approximated or encoded using the method of [RFC2045]. This is convenient for those cases where, for example, the uploaded files might contain references to each other (e.g., a TeX file and its ".sty" auxiliary style description). " So, the user agent /might/ do something? Not very encouraging. RFC 2045 says virtually nothing, but there is an RFC specifically covering the Content-Disposition header: http://www.ietf.org/rfc/rfc2183.txt If you follow everything, you can piece together the following: - From http://www.ietf.org/rfc/rfc822.txt: CHAR = <any ASCII character> ; ( 0-177, 0.-127.) quoted-string = <"> *(qtext/quoted-pair) <">; Regular qtext or ; quoted chars. qtext = <any CHAR excepting <">, ; => may be folded "\" & CR, and including linear-white-space> quoted-pair = "\" CHAR ; may quote any char - From http://www.ietf.org/rfc/rfc2045.txt: value := token / quoted-string - From http://www.ietf.org/rfc/rfc2183.txt: filename-parm := "filename" "=" value So, the filename value can be a quoted string made up of any ASCII value. Great. What about non-US-ASCII characters? RFC 2183 says this in section 2: " NOTE ON PARAMETER VALUE LENGHTS: A short (length <= 78 characters) parameter value containing only non-`tspecials' characters SHOULD be represented as a single `token'. A short parameter value containing only ASCII characters, but including `tspecials' characters, SHOULD be represented as `quoted-string'. Parameter values longer than 78 characters, or which contain non-ASCII characters, MUST be encoded as specified in [RFC 2184]. " Great: another RFC to read. At least this one deals with the proper way to communicate the character encoding used for a parameter value. I think this all comes down to two things: 1. How standards-compliant is your user-agent (most aren't very good) 2. How standards-compliant is your file upload library (or servlet container). I've forgotten whether or not Tomcat includes RFC2184-style header decoding logic... I'll have to check. But it doesn't matter if your user-agent (=browser) sends the information in a non-standard way. Can you provide some header captures so we can see what's going on? > Now for another more general comment : > According to your explanation, you upload a file from a browser, and > then try to write it to the local filesystem using the name which it had > on the original workstation. > In my view, this is always a bad idea, in general. +1 > One reason is the one you already found. > The other is that if 2 users upload a file with the same name, the > second one will overwrite the first. +1 > The third is that you are leaving yourself open for all kinds of nasty > things, such as a user uploading a file with spaces in the name (always > a problem at some point), or with characters in the name that may be > very dangerous (think of a file named "> /etc/passwd" or "some.file|rm *"). I would hope that the OP was putting these files in some known root, so that uploading /etc/passwd wouldn't overwrite /etc/passwd, and that file permissions wouldn't allow this, either. Also, unlike Perl, having a pipe in a filename isn't a problem in Java :) The user can cause some other problems like uploading a file to a win32 server with a filename of "PRN", "LPT1", "COM1", etc. and causing weirdness, there. > So, if you have a chance to do that, give each uploaded file a name that > you create, and keep the original filename in some separate place if you > need it, for display only. +1 - -chris -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2eGw0ACgkQ9CaO5/Lv0PDqfQCfStHjz3X9NNxD6CgDvZbKowZp oMkAniXQ3yfLvol8jc9xGxN72+ml//sy =bwBm -----END PGP SIGNATURE----- --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org