Konstantin Kolinko wrote:
2014-02-04 André Warnier <a...@ice-sa.com>:
Konstantin Kolinko wrote:
2014-02-03 André Warnier <a...@ice-sa.com>:
André Warnier wrote:
Chris,

a note :

Christopher Schultz wrote:
...


Without quoting, unquoted Cookie names and values may be any US-ASCII
character from 0x32 - 0x7e except for any of ("(" | ")" | "<" | ">" |
"@" | "," | ";" | ":" | "\" | <"> | "/" | "[" | "]" | "?" | "=" | "{"
| "}" | SP | HT). None of the characters above are within that range,
so the cookie value must be quoted. (It looks to me like Cookie names
must always be in US-ASCII... I didn't think that was the case but I'm
not motivated to track-down every word of the spec looking for
justification).

What is the character encoding of the request? What client are you
using? Who created the cookie in the first place?

I did the tracking down of the (tortuous) specs, and come to this :

1) the ISO-8859-1 character set includes "é" (Catalan and other
languages)
(*)

2) the US-ASCII character set is a subset of ISO-8859-1, and does not
include "é".

3) The default character set for HTTP 1.1 is ISO-8859-1, as stated
explicitly and implicitly in various places in RFC 2616 [1].

However, RFC 2616 does not define the "Cookie" nor "Set-Cookie" headers,
and it also does not specifically indicate which character set should be
used for HTTP Request/Response header values. It refers for that to MIME
(RFC 822), which talks only about US-ASCII.

2) The "Cookie" and "Set-Cookie" headers seem to be subsequently and
lastly defined in RFC 6265 [2].
In section 4.1.1 [3], the syntax of these headers is defined, as :

 cookie-pair       = cookie-name "=" cookie-value
 cookie-name       = token
 cookie-value      = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
 cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
                       ; US-ASCII characters excluding CTLs,
                       ; whitespace DQUOTE, comma, semicolon,
                       ; and backslash
 token             = <token, defined in [RFC2616], Section 2.2>

Thus, it seems that you are right, and that a cookie *value* can
(regrettably still) only consist of US-ASCII characters (not including
"é"
thus).

(I cannot find in the specs a way to quote a non-US-ASCII character
either; that's apparently only allowed in parts defined as "comments")

(It is stated somewhere else in RFC 6265 that it is recommended to
encode
the Cookie value via e.g. Base64, if it were to potentially contain non
US-ASCII characters).

The cookie name is a "token", and the definition of "token" sends us
back
to RFC2616.
In "2.2 Basic Rules", RFC2616 states :

   token          = 1*<any CHAR except CTLs or separators>
       separators     = "(" | ")" | "<" | ">" | "@"
                      | "," | ";" | ":" | "\" | <">
                      | "/" | "[" | "]" | "?" | "="
                      | "{" | "}" | SP | HT
...
      CHAR           = <any US-ASCII character (octets 0 - 127)>
      CTL            = <any US-ASCII control character
                        (octets 0 - 31) and DEL (127)>

So, this all would tend to show that you are right, and that Cookie
names
(as well as values) can only consist of US-ASCII characters, and that
"é" is
thus not allowed (without some form of encoding that would represent it
as a
sequence of US-ASCII characters).

Which, in my personal opinion is a lasting p-i-t-a and shame.  And I
cannot imagine how it can be nowadays that nobody has yet gotten around
to
proposing a HTTP 2.0 RFC where the default character set would be
Unicode,
UTF-8 encoded, for everything excluding maybe header names.  But that's
neither here nor there.

To get back to the original OP's question thus, it seems to me that
- Tomcat 7.x would not be in violation of the specs, if it indeed
rejects
a Cookie header containing any non-US-ASCII character (whether in the
cookie
name or value).
- That the error message could be improved ("é" is not a control
character, it's just invalid here)
- but that the real fix for the OP may be to Base64-encode the cookie
value before sending it to the browser.
That's also because it may happen one day that even a browser which
respects the specs to the letter (one never knows), could reject a value
like : "abcé","abc","abc","abc","abc","abc","abc","abc","abc";


[1] http://tools.ietf.org/search/rfc2616
[2] http://tools.ietf.org/search/rfc6265
[3] http://tools.ietf.org/search/rfc6265#section-4.1.1


As an appendix, and triggered by another post to this list, here is
another
way of encoding HTTP header values :

Cookie: ACE_COOKIE=R660302447; TD3World=R760446058
SM_TRANSACTIONID:
=?UTF-8?B?MGE2NDA2MDEtNDAzMy01MjdjYzlkMy0wMDBhLTJjMWI0NjJi?=
SM_AUTHTYPE: =?UTF-8?B?QXV0bw==?=
SM_SDOMAIN: =?UTF-8?B?LnRveW90YS1ldXJvcGUuY29t?=

In this case, the cookie values are encoded using a "MIME extension"
scheme
which indicates (between =? ? ?) prior to a string's value, the character
set/encoding in which the subsequent string is to be interpreted.
This is not explicitly mentioned in any of the above references, but as I
recall, this is part of another series of RFC's, maybe starting at this
one
:
http://tools.ietf.org/html/rfc2184
Now how this works out (also browser-side) with Cookie headers composed
of
cookie names and values, I couldn't say.

RFC 2616
also says the following on page 16:

   The TEXT rule is only used for descriptive field contents and values
   that are not intended to be interpreted by the message parser. Words
   of *TEXT MAY contain characters from character sets other than ISO-
   8859-1 [22] only when encoded according to the rules of RFC 2047
   [14].

       TEXT           = <any OCTET except CTLs,
                        but including LWS>

RFC 2047 is also referenced in Javadoc for HttpServletResponse.setHeader()

The "B" encoding used in an example above is one of encodings allowed
by RFC2047 ch.4.1.

http://www.ietf.org/rfc/rfc2047.txt

Yes, but it never says anywhere that a "cookie value" may contain "*TEXT".
Explicitly, it only mentions "*cookie-octet".


I meant the following part (page 32 of RFC 2616) which defines what
syntax of HTTP headers is, in general.

       message-header = field-name ":" [ field-value ]
       field-name     = token
       field-value    = *( field-content | LWS )
       field-content  = <the OCTETs making up the field-value
                        and consisting of either *TEXT or combinations
                        of token, separators, and quoted-string>

TEXT is as I quoted above.
tokens are US-ASCII minus some characters
quotes-string is TEXT inside of double quotes.

Thus there are limits on headers syntax in general,
including "Cookie" and "Set-Cookie" headers.

And, what does it all mean browser-side, particularly for Cookies ?


Browsers have to be compliant. Are they?

Supposedly.  But in the practice, are they ?
If I send from the server a cookie via a Response header like :

Set-Cookie: =?iso-8859-1?B?mycookie=äöüéè

do IE 7+, Firefox, Chrome browsers interpret this correctly, and understand this as a cookie named "mycookie" with a value of "äöüéè" ?
If one of them doesn't, then this is not a practical answer to the OP's problem.

(He can complain to the developers of the non-compliant browser, but how much of a chance does he have to get it fixed soon enough for his problem ?)


Most browsers for example have a "show cookies" function, where they will
display the cookie name, value, and other attributes separately.

That is display of their internal database. It has nothing to do with
what is allowed on the wire.


Agreed, but my point here was to illustrate the "does the browser understand 
it" question.

To be practical :

In the OP's original question, the server application would like to set a cookie named 'GetUser_Properties' with a value '"abc","abcé","abc","abc","abc","abc","abc","abc","abc"'

Clearly (I think) according to the specs, this is not valid :

Set-Cookie: 
GetUser_Properties="abc","abcé","abc","abc","abc","abc","abc","abc","abc"

As I understand from the specs, this might be valid (one line):

Set-Cookie: =?iso-8859-1?Q?GetUser_Properties="abc","abcé","abc","abc","abc","abc","abc","abc","abc"

But, does an average browser understand it ?
And if Tomcat 6 / 7 receive a cookie header like (1 line) :

Cookie: =?iso-8859-1?Q?GetUser_Properties="abc","abcé","abc","abc","abc","abc","abc","abc","abc"

do they understand it ?

Or, would you have any other recommendation of how the server should set this 
cookie ?

What I would do (assuming that the browser-side itself doesn't need to use the value of the cookie, just re-send it to the server unchanged)
is :

Set-Cookie: GetUser_Properties=BBBB..BB

where "BBBB..BB" is the Base64-encoded value of the iso-8859-1 string '"abc","abcé","abc","abc","abc","abc","abc","abc","abc"'

and the server-side application, when receiving the corresponding cookie, should Base64-decode the cookie value before parsing it.

And I am quite sure that
1) it matches all the specs
2) any browser and any server would support this.
3) it is not "native" to any webserver to Base64-encode/decode cookie values, but it is fairly easy to implement




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to