Not sure this is the right fourm to discuss this issue. I found this "problem" when I debugging a UTF-8 email message.

When I look into some email that we have problem with, I just saw some Content-Type header like the following:

Content-Type: text/html; charset="UTF-8"

As I remember, the MIME specification does not allowed "" with the charset parameter and it should only accept

Content-Type: text/html; charset=UTF-8

but not charset="UTF-8"

So... I check the MIME spec try to figure out is it allowed or not. What shock me is the original MIME specification RFC 1521 disallowed it  
http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc1521.html#sec-7.1.1
and
http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc1521.html#sec-7.1.2

The formal grammar for the content-type header field for text is as follows:

text-type := "text" "/" text-subtype [";" "charset" "=" charset]

text-subtype := "plain" / extension-token

charset := "us-ascii"/ "iso-8859-1"/ "iso-8859-2"/ "iso-8859-3"
/ "iso-8859-4"/ "iso-8859-5"/ "iso-8859-6"/ "iso-8859-7"
/ "iso-8859-8" / "iso-8859-9" / extension-token
but RFC 2045 which obsoleted RFC 1521 allow the " quoted charset name:
see http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2045.html#sec-5.1

     parameter := attribute "=" value

     attribute := token
                  ; Matching of attributes
                  ; is ALWAYS case-insensitive.
  
....
     value := token / quoted-string
  

Note that the value of a quoted string parameter does not include the quotes. That is, the quotation marks in a quoted-string are not a part of the value of the parameter, but are merely used to delimit that parameter value. In addition, comments are allowed in accordance with RFC 822 rules for structured header fields. Thus the following two forms

Content-type: text/plain; charset=us-ascii (Plain text)

Content-type: text/plain; charset="us-ascii"

are completely equivalent.

I never aware this differences between RFC 1521 and RFC 2045. Not sure about you folks aware of it or not.

I also check HTTP 1.1- RFC 2068. and HTTP 1.0 RFC 1945 . It looks like both specification have conflict language within the same specification about this issue:
http://www.w3.org/Protocols/rfc1945/rfc1945
http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2068.html

While one place say:

     charset = "US-ASCII"
             | "ISO-8859-1" | "ISO-8859-2" | "ISO-8859-3"
             | "ISO-8859-4" | "ISO-8859-5" | "ISO-8859-6"
             | "ISO-8859-7" | "ISO-8859-8" | "ISO-8859-9"
             | "ISO-2022-JP" | "ISO-2022-JP-2" | "ISO-2022-KR"
             | "UNICODE-1-1" | "UNICODE-1-1-UTF-7" | "UNICODE-1-1-UTF-8"
             | token
and
       token          = 1*<any CHAR except CTLs or tspecials>

       tspecials      = "(" | ")" | "<" | ">" | "@"
                      | "," | ";" | ":" | "\" | <">
                      | "/" | "[" | "]" | "?" | "="
                      | "{" | "}" | SP | HT
which ruled out the use of quoted-string
  
The other placce it  said
3.6  Media Types

   HTTP uses Internet Media Types [13] in the Content-Type header field
   (Section 10.5) in order to provide open and extensible data typing.

       media-type     = type "/" subtype *( ";" parameter )
....
       parameter      = attribute "=" value
....
       value          = token | quoted-string

:( :( :( :(
Therefore we need to make sure
1. all the mailer which receive email not only deal with charset=value but also charset="value". I am not sure about Mozilla can deal with it or not. How about your email program?

2. The browse can deal with
Content-Type: text/html; charset="value"
in additional to
Content-Type: text/html; charset=value

3. because we also use META tag in the HTML to reflect the HTTP header,  that mean the browser not only have to deal with the following kind of meta tag

<meta http-equiv="content-type" content="text/html; charset=value">
<meta http-equiv="content-type" content='text/html; charset=value'>
but also
<meta http-equiv="content-type" content='text/html; charset="value"'>

:( :( :( :(

not sure does mozilla handle 2 or 3. How about IE?

However, for email, since RFC 1521 does NOT allow it, to make sure it work with most of the email program, when we try to send out internet email, we should try to use

Content-Type: text/html; charset=UTF-8

instead of  
Content-Type: text/html; charset="UTF-8"

Can you check this issue with the product that you are working on ?









Reply via email to