Jiří Eichler wrote:
..
I just checked your on-line example.
I used Firefox 3.1, with the "HttpFox" add-on (recommended).
This shows exactly what the browser is sending to the server.
In this case, the form does a POST, in the "multipart/form-data" encoding.
I sent a small test file, which I created on my disk under Windows XP
(German, so basically latin-1, not latin-2 like yours).
I used cut-and-paste from your email, to copy the filename.
The file name is thus the same as your example, but with a .txt extension.
In the Windows Explorer (not IE), the file name on my disk looks like :
složka.txt
(I used cut-and-paste again in the Explorer to copy this into this email).
I used your (nice) example form to send this file to your server, and
traced it with HttpFox.
This is actually what the browser is sending, as part of that multipart
POST :
-----------------------------20037128598723
Content-Disposition: form-data; name="uploadedfile"; filename="složka.txt"
Content-Type: text/plain
test sloka
-----------------------------20037128598723--
Important : the browser is NOT sending this filename as a part of the
URL. It is sending it in the BODY of the POST request.
It is also not sending it encoded as "slo%C5%BEka". It /is/ sending the
filename encoded as UTF-8.
That means, that if there is a translation going on here, it is NOT at
the level of the upload URL.
Now the question is to know how your PHP script really interprets this
filename. As UTF-8 ? How do you know for sure ?
(I am not a PHP specialist, at all)
I mean, precisely :
the PHP script "uploader.php", somehow "gets" the value of the parameter
"uploadedfile" as a string, representing the filename that the browser
uploaded.
In which encoding (in PHP) /is/ that string ? does PHP know that this is
Unicode/UTF-8 ?
Or does /PHP/ (which runs under Apache, which runs under Windows, on a
Windows system where the default charset is cp-1250) think that this
string is encoded in cp-1250 ?
And then, when PHP writes this file to the disk, it encodes the filename
/again/ into Unicode, and thus this time the "ž" (which is originally 2
bytes representing 1 Unicode character), now becomes 4 bytes
representing the UTF-8 encoding of "Å" and "¾" ....
... and then, PHP generates the index listing. And, in this index page,
it generates the "href" as
<a href="slo%c4%b9%c4%beka.txt">
which looks very much like it could be the Unicode/UTF-8 encoding of
"složka.txt", but not like the Unicode/UTF-8 encoding of "složka.txt".
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
" from the digest: users-digest-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org