Christopher Schultz wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

André,

(Man, I need to get a keyboard mapping for "é". This copy-and-paste
thing is such a drag...)

Well, you can use Andre, I don't mind and I'm used to all kinds of spellings. Or you can use André , the special form for people who haven't dominated their MIME charsets yet ;-)

Plus, the fact that the same applications often do offer the possibility
to submit very large non-USASCII text fields.

The size of the fields shouldn't be an issue, unless you want to stream
the data yourself.

Well yes it does, in a number of situations. Think for example about webserver logs, where these things then appear as a very very long string, percent-escaped to boot.

[...]


I took an early aversion to application/x-www-form-urlencoded,
...
No, it will work and its better that GET because it's encoded using the
Content-Type of the request, rather than God-knows-what given the
browser settings.

There is no "Content-Type of the request". Try it : make a GET request (or a POST with application/x-www-form-urlencoded), and look for a request Content-Type with a charset.
For a GET there is no content-type (because there is no request body).
For a POST there is a content-type, but without charset.

The gist of it is : for an "enctype=application/x-www-form-urlencoded" (whether explicit or by default), the URL is encoded in whatever charset the browser feels like encoding it. Which MAY depend on what the browser thinks the charset of the page is, which contains the <form>; or the "accept-charset" attribute of the form tag, or the user's preferences.
But whatever the browser is in the end sending you, it does not say.

The only differences I see between multipart/form-data and
application/x-www-urlencoded encoding types are the W3C's choice for the
default and the servlet spec's requirement (both x-www) and the W3C's
statement about <input type="file" />.

http://www.w3.org/TR/html401/interact/forms.html#adef-enctype
says, quote :
The value "multipart/form-data" should be used in combination with the INPUT element, type="file".
unquote
Note that it does /not/ say that it should /not/ be used with something else. What it says is that if you upload a file, you SHOULD use the multipart/form-data content encoding, because of course it does not make any sense to send the whole file as a "&file=...(10MB)...." application/x-www-urlencoded encoded string, percent-escaped to boot.

It is also a big disappointment to see (you are right, I checked) that
the Servlet Spec does not foresee a simple method to get the parameter
values if they are posted via the multipart/form-data encoding method.

This is because the implication of using multipart/form-data is that the
app code will read its own stream. If you upload a 100MB file, do you
want that whole thing in memory as a (useless) String value?

Let me introduce you to the hidden beauties of Perl, and of the CGI.pm module. Read this :
http://cpan.uwinnipeg.ca/htdocs/CGI.pm/CGI.html#CREATING_A_FILE_UPLOAD_FIELD
You can skip the first part, which is about creating a file upload field when composing a form. The second part, starting at this shaded box :
   $filename = param('uploaded_file');
explains what happens at the server side when reading such a request parameter. Essentially you do :
$filename = param('name'); (Java : f = req.getParameter("name");)
In Perl, $filename is now a string containing the uploaded /filename/, as explained. But $filename is also ("magically") a /filehandle/, as soon as you treat it like one and read from it. That filehandle is connected to a temporary file in which the module has already read and saved the file part as uploaded by the browser.
So, no, it is not a 10 MB string in memory.
If the programmer closes that filehandle, the file is automatically deleted from whatever temporary space it occupied.
Keep reading, and don't miss the
 $type = uploadInfo($filename)->{'Content-Type'};


So what is Perl's default charset? I find it hard to believe that Perl
just magically works with the same missing charset information.

"magically" is a word full of connotations, in perl.
(Like "any sufficiently advanced technology..")
But you are right, even perl cannot magically determine the charset if the browser does not supply it. In our applications, we are the ones sending the forms to the client, and we know the type of encoding to expect from them. Just to keep people honest, we also always include a hidden parameter containing a UTF-8 string with non-US-ASCII characters, and check the returned length (in bytes and in characters) when the form is submitted. If there is a discrepancy between them, we know that the form parameter's encoding is not what it should be, and reject the post.

Read the body myself and parsing it ? in 2009 ?

Yes, read it yourself. You told the servlet container that you wanted to
do it. I'm actually surprised that getParameter() gets you any of your
POST form data when using multipart/form-data. You never did say how it
failed: do you get a bad String (misinterpreted) or do you get null
because getParameter didn't parse the request?

It doesn't because so far I am not processing form posts in Java servlets.
This discussion started because I need to do it now, in relation with the same external application for which I posted some questions about BufferedInputStreamReader's and such a while ago. Then, it was related to the fact that this application was sending wrong output back to the browser (iso-8859-2 but with a iso-8859-1 output charset). That, I had to fix using an output filter module back at the Apache level. Now I have the problem in reverse : the application gets input from an iso-8859-2 form, in iso-8859-2, but is interpreting it as iso-8859-1. I was just wondering if by changing the form to use the multipart/form-data encoding type, the servlet would "magically" realise the errors of his ways, and read the data properly. Apparently however, browsers and HTTP and Servlet Specs conspire to make my life difficult.


Now come on, I am sure that there must exist some standard Java library
usable in a servlet context, and which does that, no ?

You can use commons-upload, which was intended to be used with file
uploads, and will probably read "simple" multipart/form-data fields as well.

That's interesting, in a general sense. I didn't know that one. Where does it live ?
Unfortunately here, since I cannot modify the servlet, I'm stuck.
But the setRequestCharacterEncoding filter will help in this case.


Ok, I found it. It is FileUpload, at http://commons.apache.org/fileupload/
and it looks like Java may be as smart as perl after all ;-)


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to