Christopher Schultz wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
André,
(Man, I need to get a keyboard mapping for "é". This copy-and-paste
thing is such a drag...)
Well, you can use Andre, I don't mind and I'm used to all kinds of
spellings. Or you can use André , the special form for people who
haven't dominated their MIME charsets yet ;-)
Plus, the fact that the same applications often do offer the possibility
to submit very large non-USASCII text fields.
The size of the fields shouldn't be an issue, unless you want to stream
the data yourself.
Well yes it does, in a number of situations. Think for example about
webserver logs, where these things then appear as a very very long
string, percent-escaped to boot.
[...]
I took an early aversion to application/x-www-form-urlencoded,
...
No, it will work and its better that GET because it's encoded using the
Content-Type of the request, rather than God-knows-what given the
browser settings.
There is no "Content-Type of the request". Try it : make a GET request
(or a POST with application/x-www-form-urlencoded), and look for a
request Content-Type with a charset.
For a GET there is no content-type (because there is no request body).
For a POST there is a content-type, but without charset.
The gist of it is : for an "enctype=application/x-www-form-urlencoded"
(whether explicit or by default), the URL is encoded in whatever charset
the browser feels like encoding it. Which MAY depend on what the browser
thinks the charset of the page is, which contains the <form>; or the
"accept-charset" attribute of the form tag, or the user's preferences.
But whatever the browser is in the end sending you, it does not say.
The only differences I see between multipart/form-data and
application/x-www-urlencoded encoding types are the W3C's choice for the
default and the servlet spec's requirement (both x-www) and the W3C's
statement about <input type="file" />.
http://www.w3.org/TR/html401/interact/forms.html#adef-enctype
says, quote :
The value "multipart/form-data" should be used in combination with the
INPUT element, type="file".
unquote
Note that it does /not/ say that it should /not/ be used with something
else. What it says is that if you upload a file, you SHOULD use the
multipart/form-data content encoding, because of course it does not make
any sense to send the whole file as a "&file=...(10MB)...."
application/x-www-urlencoded encoded string, percent-escaped to boot.
It is also a big disappointment to see (you are right, I checked) that
the Servlet Spec does not foresee a simple method to get the parameter
values if they are posted via the multipart/form-data encoding method.
This is because the implication of using multipart/form-data is that the
app code will read its own stream. If you upload a 100MB file, do you
want that whole thing in memory as a (useless) String value?
Let me introduce you to the hidden beauties of Perl, and of the CGI.pm
module. Read this :
http://cpan.uwinnipeg.ca/htdocs/CGI.pm/CGI.html#CREATING_A_FILE_UPLOAD_FIELD
You can skip the first part, which is about creating a file upload field
when composing a form. The second part, starting at this shaded box :
$filename = param('uploaded_file');
explains what happens at the server side when reading such a request
parameter. Essentially you do :
$filename = param('name'); (Java : f = req.getParameter("name");)
In Perl, $filename is now a string containing the uploaded /filename/,
as explained. But $filename is also ("magically") a /filehandle/, as
soon as you treat it like one and read from it. That filehandle is
connected to a temporary file in which the module has already read and
saved the file part as uploaded by the browser.
So, no, it is not a 10 MB string in memory.
If the programmer closes that filehandle, the file is automatically
deleted from whatever temporary space it occupied.
Keep reading, and don't miss the
$type = uploadInfo($filename)->{'Content-Type'};
So what is Perl's default charset? I find it hard to believe that Perl
just magically works with the same missing charset information.
"magically" is a word full of connotations, in perl.
(Like "any sufficiently advanced technology..")
But you are right, even perl cannot magically determine the charset if
the browser does not supply it.
In our applications, we are the ones sending the forms to the client,
and we know the type of encoding to expect from them.
Just to keep people honest, we also always include a hidden parameter
containing a UTF-8 string with non-US-ASCII characters, and check the
returned length (in bytes and in characters) when the form is submitted.
If there is a discrepancy between them, we know that the form
parameter's encoding is not what it should be, and reject the post.
Read the body myself and parsing it ? in 2009 ?
Yes, read it yourself. You told the servlet container that you wanted to
do it. I'm actually surprised that getParameter() gets you any of your
POST form data when using multipart/form-data. You never did say how it
failed: do you get a bad String (misinterpreted) or do you get null
because getParameter didn't parse the request?
It doesn't because so far I am not processing form posts in Java servlets.
This discussion started because I need to do it now, in relation with
the same external application for which I posted some questions about
BufferedInputStreamReader's and such a while ago. Then, it was related
to the fact that this application was sending wrong output back to the
browser (iso-8859-2 but with a iso-8859-1 output charset). That, I had
to fix using an output filter module back at the Apache level.
Now I have the problem in reverse : the application gets input from an
iso-8859-2 form, in iso-8859-2, but is interpreting it as iso-8859-1.
I was just wondering if by changing the form to use the
multipart/form-data encoding type, the servlet would "magically" realise
the errors of his ways, and read the data properly.
Apparently however, browsers and HTTP and Servlet Specs conspire to make
my life difficult.
Now come on, I am sure that there must exist some standard Java library
usable in a servlet context, and which does that, no ?
You can use commons-upload, which was intended to be used with file
uploads, and will probably read "simple" multipart/form-data fields as well.
That's interesting, in a general sense. I didn't know that one. Where
does it live ?
Unfortunately here, since I cannot modify the servlet, I'm stuck.
But the setRequestCharacterEncoding filter will help in this case.
Ok, I found it. It is FileUpload, at http://commons.apache.org/fileupload/
and it looks like Java may be as smart as perl after all ;-)
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org