Re: [slightly OT] FORM based authentication and utf-8 encoding of credentials

André Warnier Tue, 02 Jul 2013 08:26:41 -0700

Shanti Suresh wrote:

Greetings,



On Wed, Jun 26, 2013 at 4:08 PM, Christopher Schultz <
[email protected]> wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

André,

But, even when sending UTF-8 encoded data according to this
principle, they are *not* indicating that it is UTF-8 data, which
is basically wrong, because the standard HTTP/HTML character set is
iso-8859-1, and they *should* indicate it when that is not what
they are sending.  But that is the reality.

No, as much as it pains me to do so, I agree with with Mozilla folks
on this one: adding a charset attribute to an
application/x-form-urlencoded Content-Type violates the spec. There is
no good solution.
...

We really need an RFC for HTTP 2.0, with UTF-8 as the default
charset/encoding.

+1

Maybe they can clear-up Tomcat logging configuration while they are at
it :)

Thank you!  This discussion was quite informational.


You are welcome.

Further as relatively [OT], in some other - non-Tomcat, non-Java - applications, we solvethe general issue as follows (taking into account the deficiencies of the RFCs, theservers, the browsers, and the users) :- when programmers create the html documents containing the forms, they must make surethat they use a tool which really saves the html document in the charset/encoding thatcorresponds to their wishes

- the html page should contain a declaration like :
<meta http-equiv="Content-Type" content="text/html; charset=xxxxx" />
(where xxxx is the correct charset/encoding, like "UTF-8")

- each form that is sent to the browser is sent by the server with an explicit HTTP header: Content-type: text/html; charset=xxxx

(that normally happens automatically, but you should nevertheless check that it 
matches)

- the <form> tag of the form should contain the "accept-charset" attribute with theexpected character set as value, like

<form accept-charset="UTF-8" ...>
- the form itself contains a hidden parameter like :
<input type="hidden" name="charset-test" value="yyyyy">

(where yyyyy is a character sequence which is so that, seen as bytes, its length would bedifferent depending on the real character set used. E.g., for Europe, "ÖöÜüÄä")- the application which receives the form submit data, must first check if the stringreceived for the "charset-test" parameter matches its expectations.In other words, if the application expects UTF-8, then it should check that the receivedstring has a byte length of 12 and a character length of 6, and matches a Unicode string"ÖöÜüÄä")And if it doesn't, then it should take appropriate action (abort the action, or tryanother charset)(if the form sent by the server contains additional data coming from a back-end databasesystem, then one should also check that the charset of that data matches the one of theform of course).

This may look a bit like overkill, but it is the result of long and painful real-worldexperience with multi-lingual applications used with multiple browsers and multiple typesof users in multiple countries doing cut-and-paste of all kinds of stuff into forms.






---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [slightly OT] FORM based authentication and utf-8 encoding of credentials

Reply via email to