On 11 Feb 2009, at 02:18, Mark Nottingham wrote:

[ASCII vs UTF-8]

OTOH we're talking about a SHOULD here. Maybe it just needs more careful guidance; i.e., that you should stick to ASCII unless you're conveying elements for presentation to end users.

Well, one point to consider is how you expect IRIs and IRI references to be represented.

There's one school of thought (more common in the IETF crowd) that says that these should be convereted to ASCII early, and therefore shouldn't occur here.

The other school of thought (more common at W3C) says that they're fine in the places where XML and other document formats have always accepted URIs, and therefore should be representable in this spot.

There are some properties of the direction that the IDNA update effort is going into that suggest that the IETF school of thought is less likely to cause interoperability problems.

The other question is what the cost of violating this SHOULD is. Assume that some people have a really good reason to violate an ASCII or ISO-8859-1 SHOULD, and actually go for UTF-8. You now get mixed character sets in a single metadata file. I'm not sure that's desirable...

(BTW, are we just going down the rathole of defining yet another tag- value format that's subtly different? Maybe the spec should just say "use HTTP header format, but with UTF-8", or "use RFC 822, but with UTF-8".)

--
Thomas Roessler, W3C  <t...@w3.org>


Reply via email to