-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

André,

On 4/7/2011 12:26 PM, André Warnier wrote:
> What I am saying is that, since you have the same Tomcat version on both
> systems, the code which works differently is unlikely to be in Tomcat
> itself.  To my recollection (maybe wrong), Tomcat 6.0 does not include
> any code that can deal with a multi-part POST.
> (I think that Tomcat 7.0 does).

Correct: Tomcat 6 does not include any multipart-parsing code, while
Tomcat 7 does, since it implements the Servlet 3.0 Multipart upload
features.

Note that the URIEncoding setting on your <Connector> is not relevant,
since the filename is being read from the request /body/ and not from
the URI.

I would use Fiddler(2?), LiveHttHeaders, FireBug, etc. to see if there
is a difference on the /client/ side between these two situations. If
the client is sending a different Content-Type, then things may go wrong.

Here's the problem.... the ARPA Internet Text Messages standard (from
which HTTP et al descend) doesn't say how to encode message headers that
include non-US-ASCII characters. This includes filenames with
non-US-ASCII characters that are embedded in the

The W3C says this (http://www.w3.org/TR/html401/interact/forms.html) in
section 17.13.4:

"
The user agent should attempt to supply a file name for each submitted
file. The file name may be specified with the "filename" parameter of
the 'Content-Disposition: form-data' header, or, in the case of multiple
files, in a 'Content-Disposition: file' header of the subpart. If the
file name of the client's operating system is not in US-ASCII, the file
name might be approximated or encoded using the method of [RFC2045].
This is convenient for those cases where, for example, the uploaded
files might contain references to each other (e.g., a TeX file and its
".sty" auxiliary style description).
"

So, the user agent /might/ do something? Not very encouraging. RFC 2045
says virtually nothing, but there is an RFC specifically covering the
Content-Disposition header: http://www.ietf.org/rfc/rfc2183.txt

If you follow everything, you can piece together the following:

- From http://www.ietf.org/rfc/rfc822.txt:
     CHAR        =  <any ASCII character>        ; (  0-177,  0.-127.)

     quoted-string = <"> *(qtext/quoted-pair) <">; Regular qtext or
                                                 ;   quoted chars.

     qtext       =  <any CHAR excepting <">,     ; => may be folded
                     "\" & CR, and including
                     linear-white-space>

     quoted-pair =  "\" CHAR                     ; may quote any char

- From http://www.ietf.org/rfc/rfc2045.txt:

     value := token / quoted-string

- From http://www.ietf.org/rfc/rfc2183.txt:

     filename-parm := "filename" "=" value

So, the filename value can be a quoted string made up of any ASCII
value. Great. What about non-US-ASCII characters?

RFC 2183 says this in section 2:

"
   NOTE ON PARAMETER VALUE LENGHTS: A short (length <= 78 characters)
   parameter value containing only non-`tspecials' characters SHOULD be
   represented as a single `token'.  A short parameter value containing
   only ASCII characters, but including `tspecials' characters, SHOULD
   be represented as `quoted-string'.  Parameter values longer than 78
   characters, or which contain non-ASCII characters, MUST be encoded as
   specified in [RFC 2184].
"

Great: another RFC to read. At least this one deals with the proper way
to communicate the character encoding used for a parameter value.

I think this all comes down to two things:

1. How standards-compliant is your user-agent (most aren't very good)

2. How standards-compliant is your file upload library (or servlet
container).

I've forgotten whether or not Tomcat includes RFC2184-style header
decoding logic... I'll have to check. But it doesn't matter if your
user-agent (=browser) sends the information in a non-standard way.

Can you provide some header captures so we can see what's going on?

> Now for another more general comment :
> According to your explanation, you upload a file from a browser, and
> then try to write it to the local filesystem using the name which it had
> on the original workstation.
> In my view, this is always a bad idea, in general.

+1

> One reason is the one you already found.
> The other is that if 2 users upload a file with the same name, the
> second one will overwrite the first.

+1

> The third is that you are leaving yourself open for all kinds of nasty
> things, such as a user uploading a file with spaces in the name (always
> a problem at some point), or with characters in the name that may be
> very dangerous (think of a file named "> /etc/passwd" or "some.file|rm *").

I would hope that the OP was putting these files in some known root, so
that uploading /etc/passwd wouldn't overwrite /etc/passwd, and that file
permissions wouldn't allow this, either. Also, unlike Perl, having a
pipe in a filename isn't a problem in Java :)

The user can cause some other problems like uploading a file to a win32
server with a filename of "PRN", "LPT1", "COM1", etc. and causing
weirdness, there.

> So, if you have a chance to do that, give each uploaded file a name that
> you create, and keep the original filename in some separate place if you
> need it, for display only.

+1

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2eGw0ACgkQ9CaO5/Lv0PDqfQCfStHjz3X9NNxD6CgDvZbKowZp
oMkAniXQ3yfLvol8jc9xGxN72+ml//sy
=bwBm
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to