Re: Tag characters and in-line graphics (from Tag characters)

Chris Tue, 02 Jun 2015 16:00:12 -0700

I was asking why the glyphs for right arrow ➡ are inconsistent in many sources, 
through a couple of iterations of unicode. Perhaps I might observe that one of 
the reasons is there is no technical link between the code and the glyph. I 
can’t realistically write a display engine that goes to unicode.org 
<http://unicode.org/> or wherever, and dynamically finds the right standard 
glyph for unknown codes. This is also manifest in my seeing empty squares □ for 
characters my platform doesn’t know about. This isn’t the case with XML where I 
can send someone a random XML document, and there is a standard way to go out 
there on the internet and check if that XML is conformant. Why shouldn’t there 
be a standard way to go out on the net and find the canonical glyph for a code? 
If there was, then non-standard glyphs would fall out of that technology 
naturally.

So people are talking about all these technologies that are out there, html5, 
cmap, fonts and so forth, but there is no standard way to construct a list of 
“characters”, some of which might be non-standard, and be able to embed that 
ANYWHERE one might reasonably expect characters, have it processed in a normal 
way as characters, be sent anywhere and understood.

As you point out, "The UCS will not encode characters without a demonstrated 
usage.”. But there are use cases for characters that don’t meet UCS’s criteria 
for a world wide standard, but are necessary for more specific use cases, like 
specialised regional, business, or domain specific situations.

My question is, given that unicode can’t realistically (and doesn’t aim to) 
encode every possible symbol in the world, why shouldn’t there be an EXTENSIBLE 
method for encoding, so that people don’t have to totally rearchitect their 
computing universe because they want ONE non-standard character in their 
documents?

Right now, what happens if you have a domain or locale requirement for a 
special character?  Most likely you suffer without it, because even though you 
could get it to render in some situations (like hand coding some IMGs into your 
web site), you just know you won’t be able to realistically input it into 
emails, word documents, spreadsheets, and whatever other random applications on 
a daily basis.

What I’m saying is it really beyond the unicode consortium’s scope, and/or 
would it really be a redundant technology to, for example, define a UTF-64 
coding format, where 32 bits allow 4 billion businesses and individuals to 
define their own characters sets (each of up to 4 billion characters), then 
have standard places on the internet (similar to DNS lookup servers) that can 
provide anyone with glyphs and fonts for it?

Right now, yes there are cmaps, but no standard way to combine characters from 
different encodings. No standard way to find the cmap for an unknown encoding. 
There is HTML5, but that doesn’t produce something that is recognisable as a 
list of characters that can be processed as such. (If there is an IMG in text, 
is it a “character” or an illustration in the text? How can you refer to a 
particular set of characters without having your own web server? How you render 
that text bigger, with the standard reference glyph without manually searching 
the internet where to find it? There is a host of problems here).

All these problems look unsolved to me, and they also look like encoding 
technology problems to me too. What other consortium is out there are working 
on character encoding problems?

> On 2 Jun 2015, at 7:40 pm, Philippe Verdy <[email protected]> wrote:
> 
> Once again no ! Unicode is a standard for encoding characters, not for 
> encoding some syntaxic element of a glyph definition !
> 
> Your project is out of scope. You still want to reinvent the wheel.
> 
> For creating syntax, define it within a language (which does not need new 
> characters (you're not creating an APL grammar using specific symbols for 
> some operators more or less based on Greek letters and geometric shapes: they 
> are just like mathematic symbols). Programming languages and data languages 
> (Javascript, XML, JOSN, HTML...) and their syntax are encoded themselves in 
> plain text documents using standard characters) and don't need new 
> characters, APL being an exception only because computers or keyboards were 
> produced to facilitate the input (those that don't have such keyboards used 
> specific editors or the APL runtime envitonment that offer an input method 
> for entering programs in this APL input mode).
> 
> Anf again you want the chicken before the egg: have you only ever read the 
> encoding policy ? The UCS will not encode characters without a demonstrated 
> usage. Nothing in what you propose is really used except being proposed only 
> by you, and used only by you for your private use (or with a few of your 
> unknown friends, but this is invisible and unverifiable). Nothing has been 
> published.
> 
> Even for currency symbols (which are an exception to the demonstrated use, 
> only because once they are created they are extremely rapidly needed by lot 
> of people, in fact most people of a region as large as a country, and many 
> other countries that will reference or use it it). But even in this case, 
> what is encoded is the character itself, not the glyph or new characters used 
> to defined the glyph !
> 
> Can you stop proposing out of topic subjects like this on this list ? You are 
> not speaking about Unicode or characters. Another list will be more 
> appropriate. You help no one here because all you want is to change radically 
> the goals of TUS.
> 
> 2015-06-02 11:01 GMT+02:00 William_J_G Overington <[email protected] 
> <mailto:[email protected]>>:
> Perhaps the solution to at least some of the various issues that have been 
> discussed in this thread is to define a tag letter z as a code within the 
> local glyph memory requests, as follows.

Re: Tag characters and in-line graphics (from Tag characters)

Reply via email to