[yam] Thoughts on text/* encoding defaults

Julian Reschke Sat, 30 Apr 2011 07:24:17 -0700

Hi there.

In Prague, we had a few hallway conversations with respect to thedefault encoding of text/* media types.

Below are my notes (references to the relevant spec sections,information about a recent change in HTTPbis, and a rough proposal abouthow to proceed).


I'm posting this here because Alexey thought the audience might fit...

Best regards, Julian

-- snip --


1) RFC 2046 says that the default is US-ASCII

"Note that the character set used, if anything other than US- ASCII,must always be explicitly specified in the Content-Type field." --<http://greenbytes.de/tech/webdav/rfc2046.html#rfc.section.4.1.2.p.18>


2) RFC 2616 says it's ISO-8859-1

"The "charset" parameter is used with some media types to define thecharacter set (Section 3.4) of the data. When no explicit charsetparameter is provided by the sender, media subtypes of the "text" typeare defined to have a default charset value of "ISO-8859-1" whenreceived via HTTP. Data in character sets other than "ISO-8859-1" or itssubsets MUST be labeled with an appropriate charset value. See Section3.4.1 for compatibility problems." --<http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.3.7.1.p.4>


3) For text/xml, RFC 3023 says it's US-ASCII, no matter what 2616 says :-)

"Conformant with [RFC2046], if a text/xml entity is received with thecharset parameter omitted, MIME processors and XML processors MUST usethe default charset value of "us-ascii"[ASCII]. In cases where the XMLMIME entity is transmitted via HTTP, the default charset value is still"us-ascii". (Note: There is an inconsistency between this specificationand HTTP/1.1, which uses ISO-8859-1[ISO8859] as the default for ahistorical reason. Since XML is a new format, a new default should bechosen for better I18N. US-ASCII was chosen, since it is theintersection of UTF-8 and ISO-8859-1 and since it is already used byMIME.)" -- <http://tools.ietf.org/html/rfc3023#section-3.1>


The problem

Recipients do not implement this; they take the absence of encodinginformation as indicator to inspect the payload; this is at least truefor text/xml and text/html (see<http://www.w3.org/TR/REC-xml/#sec-guessing> and<http://www.w3.org/TR/2011/WD-html5-20110405/parsing.html#determining-the-character-encoding>)

Current development: HTTPbis, P3 has dropped drop the default anddelegate to the relevant media type definitions (see<http://trac.tools.ietf.org/wg/httpbis/trac/ticket/20>,<http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p3-payload-14.html>).


Left to do:

a) Revise RFC 2046; allow text/* types that carry encoding informationinline to do the expected thing (overriding the US-ASCII default); warnagainst doing so in new registrations (recommend to only support UTF-8,and require to always explicitly include the charset parameter, such astext/vcard is going to do it?)

b) Revise RFC 3023 to delegate text/xml charset defaults to revision of2046?


Best regards, Julian


_______________________________________________
yam mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/yam

[yam] Thoughts on text/* encoding defaults

Reply via email to