Re: Unicode block for programming related symbols and codepoints?

Frédéric Grosshans Mon, 09 Feb 2015 06:17:23 -0800

Le 09/02/2015 13:55, Alfred Zett a écrit :

Additionally, people tend to forget that simply because Unicode isdoing emoji out of compatibility (or other) requirements, it does notmean that "now anything goes". I refer folks to TR51[1] (specificallysections 1.3, 8, and Annex C).
[1]: http://www.unicode.org/reports/tr51
You know, the fact that this consortium ever took emoji intoconsideration immediately justifies to include everything everyoneever wanted. There is no such thing as important data including emoji. :)

The including of emoji was a considerable debate here, with peoplestrongly against and strongly for. The trick is that they were alreadyused as digital characters by Japanese Telcos and their millions ofcustomers. They were de facto encoded as characters in Japanese textmessages. At the time of encoding, the spread of smartphones made themappear in other places (emails, web forums, etc.)

Jean-Francois Colson:
I need a few tens of characters for a conlang I’m developping. ☺
Except two or three control characters don't make a con language.
Also, if you don't like con languages in Unicode, what's this:http://unicode.org/charts/PDF/U1F700.pdf

I doubt that “not liking con languages” is a faithful description ofJean-François ;-)

On a more serious notes, this block is actually a set of “scientific”(at his time) notations used by Isaac Newton in its time. They wereencoded in Unicode following an academic project to digitize hismanuscripts. So here, you have characters used 3 centuries ago by noless than Isaac Newton, most of them having a much longer history, anduseful for science historians. Seehttp://www.unicode.org/L2/L2009/09037r2-alchemy.pdf for details.This does not compares with a few characters invented for a conlanginvented by an amateur and used by no one but himself. I think that isthe point Jean-François wanted to make.

A closer counter-example to Jean-François's “wish” would be Shavian(10450..1047F), but this alphabet has shown some use, and I guess thatits encoding would have been much harder without its association withsomeone as famous as George Berard Shaw or without the existingpublication of a full text in Shavian.

The problem is that Unicode only encodes characters which areeffectively used today or which have been used in the past. Itdoesn’t encode characters which could perhaps be used in ahypothetical new programing language in the future.
So you want the font encoding scheme to be a limitating factor for newthings?

It is more or less the rule, expt that is not a font encoding, but astandard encoding. Once something is encoded , it can never beunencoded. And the Unicode standard is built to stay relevant as long aspossible (decades or centuries). So you ask for your character top beencoded in billions of devices for decades. It is more than a mere fontencoding. There are a few exceptions, but only when a widespread use isreally expected, like for monetary symbols (it was the case for the Euro).

What you are asking, is a character for an untested idea. You areconvinced it is useful, but cannot prove anyone beyond yourself will useit, hence Jean-François’s parallel with conlangs. In order to have achance of success, design a language using existing characters (e.g.some APL + → for TAB) and/or private use codepoints. Once your languagestart gathering steam, come back and argue that using an arrow or a tabis awkward, and that U+XXXX SHINY TAB FOR PROGRAMMERS would be animprovement for a significant community. I know it is a lot of work, butthat is probably what it takes.


Pierpaolo Bernardi:

How would your proposed character be displayed as plain text?

There is no such thing as plain text.

When you say that, you don’t accept the premise of Unicode encoding.Unicode’s goal is to encode all plain text characters, but only plaintext characters.

Even line breaks and tabs are a matter of interpretation. It's justthat they usually have typographic semantics, even in programmingeditors, with all the side effects.
In very simple (and with that I mean shitty or not even remotelyprogramming oriented) editors, it may show like a control character,like ␄.
Browsers and any editor passing the "based on scintilla" complexitymark of course should display something that makes more sense, like anarrow or ⍈ plus surrounding space.

I think everyone her knows what you are saying, and that the notion ofplain text is a bit fuzzy. But if you cannot argue that your characterhas a meaning in plaint text, for some value of “plain text”, then youcan not hope for an encoding in Unicode.



_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode

Re: Unicode block for programming related symbols and codepoints?

Reply via email to