RE: Input from a FORM - encoding problem

Satoshi Okamoto Tue, 19 Feb 2002 00:20:23 -0800

if its servlet, try this..

response.setContentType("text/html;charset=UR ENCODING TYPE");
PrintWriter out = new PrintWriter( new
OutputStreamWriter(response.getOutputStream(), "UR ENCODING TYPE"));


-----Original Message-----
From: Attila Szegedi [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, February 19, 2002 5:16 PM
To: Tomcat Users List
Subject: Re: Input from a FORM - encoding problem


OK: he might try. I admit I've not used IE6, only IEs up to 5.5 and NN up to
4.72, but it's a fact that:

- these browsers never appended a charset declaration to the Content-Type
header (i.e. "Content-Type: application/x-form-urlencoded" and not
"Content-Type: application/x-form-urlencoded; charset=iso-8859-2" so it was
up to the server side to figure out what the charset was.

- Tomcat 3.2.x blindly decoded form data as ISO-8859-1 (in fact, it is the
code in javax.servlet.http.HttpUtils#parsePostData() method which contains
the following much revealing comment:
<quote>
        // XXX we shouldn't assume that the only kind of POST body
        // is FORM data encoded using ASCII or ISO Latin/1 ... or
        // that the body should always be treated as FORM data.

</quote>
So, even if your browser acts to the spec, Tomcat 3.2.x certainly does not.
I must underline that I don't know if 3.3.x or 4.x Tomcats rely on this
(flawed) code or not. Tomcat 4.x definitely should not, since it is supposed
to implement request.setCharacterEncoding()...

Cheers,
  Attila.

--
Attila Szegedi
home: http://www.szegedi.org


----- Original Message -----
From: "Arnold Shore" <[EMAIL PROTECTED]>
To: "Tomcat Users List" <[EMAIL PROTECTED]>
Sent: 2002. febru? 18. 16:58
Subject: RE: Input from a FORM - encoding problem


> Re "Don't bother fiddling with <FORM> attributes. I've done this before to
> no avail":
>
> I'm accepting Arabic, Hebrew, Russian, and Chinese doing exactly that,
with
> IE 6 and using Unicode encodings. (Will be trying NN and Opera shortly.)
And
> yes, I'm also using that encoding on the page.
>
> It's going into a database, with subsequent retrieval and display.  Works
> correctly for the stuff I've tried.
>
> Arnold Shore
> Annapolis, MD USA
>
> -----Original Message-----
> From: Attila Szegedi [mailto:[EMAIL PROTECTED]]
> Sent: Monday, February 18, 2002 9:39 AM
> To: Tomcat Users List
> Subject: Re: Input from a FORM - encoding problem
>
>
> Don't bother fiddling with <FORM> attributes. I've done this before to no
> avail.
>
> Right now, no matter what you specify as an encoding in a HTML page, most
> browsers (all favorite IE and NN flavors) ignore it altogether and encode
> the form data using the encoding in which the page containing the form was
> sent to them. Worse yet, they *don't* specify the encoding of characters
in
> the form data when sending them back via a POST request, so you must know
on
> the server side what was the encoding of the page that contained the form.
> Servlet 2.3 spec is meant to contain a solution for this, but I don't know
> how is it (or isn't) implemented in Tomcat 4.x.
>
> As if all of the above weren't enough, Tomcat 3.x gives yet another stab
to
> internationalization efforts: it will blindly interpret all form data as
> being iso-8859-1 (~ Cp1252), so your iso-8859-2 (~Cp1250) characters are
> lost. Again, I don't know how Tomcat 4.x line handles this.
>
> Being a Hungarian, I'm just as interested in entering 8859-2 characters in
> my pages, and not seeing ? marks on the server side, so I'm transcoding
all
> form data strings on the fly. The off-the-wall solution looks like this:
>
> param = new String(param.getBytes("8859_1"), "8859_2");
>
> altough this tends to be slow (running through Java char-to-byte, then
> through byte-to-char machinery). I have developed a fast 8859-1 to 8859-2
> transcoder that addresses speed issues; contact me in private mail and I
can
> send it to you.
>
> Cheers,
>   Attila.
> --
> Attila Szegedi
> home: http://www.szegedi.org
>
> ----- Original Message -----
> From: "Nikola Milutinovic" <[EMAIL PROTECTED]>
> To: "Tomcat Users List" <[EMAIL PROTECTED]>
> Sent: 2002. febru? 18. 15:17
> Subject: Re: Input from a FORM - encoding problem
>
>
> > > <quote>
> > > FORM attribute
> > >
> > > accept-charset = charset list [CI]
> > >     This attribute specifies the list of character encodings for input
> data that is accepted by the server processing this form. The value is a
> space- and/or comma-delimited list of charset values. The client must
> interpret this list as an
> > > exclusive-or list, i.e., the server is able to accept any single
> character encoding per entity received.
> >
> > This bit is a "bit unclear" to me. If I specify several encodings, how
> will the browser know which one was actually used? How will the server
know
> which one was used?
> >
> > Nix.
> >
>
>
> --
> To unsubscribe:   <mailto:[EMAIL PROTECTED]>
> For additional commands: <mailto:[EMAIL PROTECTED]>
> Troubles with the list: <mailto:[EMAIL PROTECTED]>
>
>
>
>


--
To unsubscribe:   <mailto:[EMAIL PROTECTED]>
For additional commands: <mailto:[EMAIL PROTECTED]>
Troubles with the list: <mailto:[EMAIL PROTECTED]>



--
To unsubscribe:   <mailto:[EMAIL PROTECTED]>
For additional commands: <mailto:[EMAIL PROTECTED]>
Troubles with the list: <mailto:[EMAIL PROTECTED]>

RE: Input from a FORM - encoding problem

Reply via email to