Le 09/02/2015 13:55, Alfred Zett a écrit :

Additionally, people tend to forget that simply because Unicode is doing emoji out of compatibility (or other) requirements, it does not mean that "now anything goes". I refer folks to TR51[1] (specifically sections 1.3, 8, and Annex C).

[1]: http://www.unicode.org/reports/tr51

You know, the fact that this consortium ever took emoji into consideration immediately justifies to include everything everyone ever wanted. There is no such thing as important data including emoji. :)
The including of emoji was a considerable debate here, with people strongly against and strongly for. The trick is that they were already used as digital characters by Japanese Telcos and their millions of customers. They were de facto encoded as characters in Japanese text messages. At the time of encoding, the spread of smartphones made them appear in other places (emails, web forums, etc.)



Jean-Francois Colson:
I need a few tens of characters for a conlang I’m developping. ☺
Except two or three control characters don't make a con language.
Also, if you don't like con languages in Unicode, what's this: http://unicode.org/charts/PDF/U1F700.pdf
I doubt that “not liking con languages” is a faithful description of Jean-François ;-)

On a more serious notes, this block is actually a set of “scientific” (at his time) notations used by Isaac Newton in its time. They were encoded in Unicode following an academic project to digitize his manuscripts. So here, you have characters used 3 centuries ago by no less than Isaac Newton, most of them having a much longer history, and useful for science historians. See http://www.unicode.org/L2/L2009/09037r2-alchemy.pdf for details. This does not compares with a few characters invented for a conlang invented by an amateur and used by no one but himself. I think that is the point Jean-François wanted to make.

A closer counter-example to Jean-François's “wish” would be Shavian (10450..1047F), but this alphabet has shown some use, and I guess that its encoding would have been much harder without its association with someone as famous as George Berard Shaw or without the existing publication of a full text in Shavian.


The problem is that Unicode only encodes characters which are effectively used today or which have been used in the past. It doesn’t encode characters which could perhaps be used in a hypothetical new programing language in the future.
So you want the font encoding scheme to be a limitating factor for new things?

It is more or less the rule, expt that is not a font encoding, but a standard encoding. Once something is encoded , it can never be unencoded. And the Unicode standard is built to stay relevant as long as possible (decades or centuries). So you ask for your character top be encoded in billions of devices for decades. It is more than a mere font encoding. There are a few exceptions, but only when a widespread use is really expected, like for monetary symbols (it was the case for the Euro).

What you are asking, is a character for an untested idea. You are convinced it is useful, but cannot prove anyone beyond yourself will use it, hence Jean-François’s parallel with conlangs. In order to have a chance of success, design a language using existing characters (e.g. some APL + → for TAB) and/or private use codepoints. Once your language start gathering steam, come back and argue that using an arrow or a tab is awkward, and that U+XXXX SHINY TAB FOR PROGRAMMERS would be an improvement for a significant community. I know it is a lot of work, but that is probably what it takes.


Pierpaolo Bernardi:
How would your proposed character be displayed as plain text?
There is no such thing as plain text.
When you say that, you don’t accept the premise of Unicode encoding. Unicode’s goal is to encode all plain text characters, but only plain text characters.
Even line breaks and tabs are a matter of interpretation. It's just that they usually have typographic semantics, even in programming editors, with all the side effects.

In very simple (and with that I mean shitty or not even remotely programming oriented) editors, it may show like a control character, like ␄.

Browsers and any editor passing the "based on scintilla" complexity mark of course should display something that makes more sense, like an arrow or ⍈ plus surrounding space.

I think everyone her knows what you are saying, and that the notion of plain text is a bit fuzzy. But if you cannot argue that your character has a meaning in plaint text, for some value of “plain text”, then you can not hope for an encoding in Unicode.


_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode

Reply via email to