2015-06-01 1:33 GMT+02:00 Chris <idou...@gmail.com>: > > Of course, anyone can invent a character set. The difficult bit is having > a standard way of combining custom character sets. That’s why a standard > would be useful. > > And while stuff like this can, to some extent, be recognised by magic > numbers, and unique strings in headers, such things are unreliable. Just > because example.net/mycharset/ appears near the start of a document, > doesn’t necessarily mean it was meant to define a character set. Maybe it > was a document discussing character sets. >
That's not what I described. I spoke about using a MIME-compatible private charset identifier, and how such private identifier can be made reasonnably unique by binding it to a domain name or URI. If you had read more carefully I also said that it was absolutely not necessary to dereference that URL: there are many XML schemas binding their namespaces to a URI which is itself not a webpage or to any downloadable DTD or XML schema or XML stylesheet. Google and Microsoft are using this a lot in lots of schemas (which are not described and documented at this URL if they are documented). The URI by itself is just an identifier, it becomes a webpage only when you use it in a web page with an href attribute to create an hyperlink, or to perform some query to a service returning some data. An identifier for a private charset does not need to perform any request to be usable by itself, we just have the identifier which is sufficient by itself. The URI can be also only a base URI for a collection of resources (whose URLs start by this base URI, with conventional extensions appended to get the character properties, or a font; but the best way is to embed this data in your document, in some header or footer, if your document using the private charset is not part of a collection of docs using the same private charset) In that case, you don't need a new UTF: UTF-8 remains usable and you can map your private charset to standard PUAs (and/or to "hacked" characters) according to the private charset needs. The charset indicated in your document (by some meta header) should be sufficient to avoid collisions with other private conventions, it will define the scope of your private charset as the document itself, which will then be interchangeable (and possibly mixable with other documents with some renumbering if there a collisions of assignments between two distinct private charsets: in the document header; add to the charset identifier the range of PUAs which is used, then with two documents colling on this range, you can reencode one automatically by creating a compound charset with subranges of PUAs remapped differently to other ranges).