Re: NNBSP (was: A last missing link for interoperable representation)

Marcel Schneider via Unicode Thu, 17 Jan 2019 09:40:13 -0800

On 17/01/2019 12:21, Philippe Verdy via Unicode wrote:


[quoted mail]

But the French "espace fine insécable" was requested long long before Mongolian 
was discussed for encodinc in the UCS. The problem is that the initial rush for French 
was made in a period where Unicode and ISO were competing and not in sync, so no 
agreement could be found, until there was a decision to merge the efforts. Tge early rush 
was in ISO still not using any character model but a glyph model, with little desire to 
support multiple whitespaces; on the Unicode side, there was initially no desire to 
encode all the languages and scripts, focusing initially only on trying to unify the 
existing vendor character sets which were already implemented by a limited set of 
proprietary vendor implementations (notably IBM, Microsoft, HP, Digital) plus a few of 
the registered chrsets in IANA including the existing ISO 8859-*, GBK, and some national 
standard or de facto standards (Russia, Thailand, Japan, Korea).
This early rush did not involve typographers (well there was Adobe at this time but still 
using another unrelated technology). Font standards were still not existing and were 
competing in incompatible ways, all was a mess at that time, so publishers were still 
required to use proprietary software solutions, with very low interoperability (at that 
time the only "standard" was PostScript, not needing any character encoding at 
all, but only encoding glyphs!)


Thank you for this insight. It is a still untold part of the history of Unicode.

It seems that there was little incentive to involve typographers because they 
have no computer science training, and because they were feared as trying to 
enforce requirements that Unicode were neither able nor willing to meet, such 
as distinct code points for italics, bold, small caps…

Among the grievances, Unicode is blamed for confusing Greek psili and dasia 
with comma shapes, and for misinterpreting Latin letter forms such as the u 
with descender taken for a turned h, and double u mistaken for a turned m, 
errors that subsequently misled font designers to apply misplaced serifs. 
Things were done in a hassle and a hurry, under the Damokles sword of a hostile 
ISO messing and menacing to unleash an unusable standard if Unicode wasn’t 
quicker.


If publishers had been involded, they would have revealed that they all needed 
various whitespaces for correct typography (i.e. layout). Typographs themselves 
did not care about whitespaces because they had no value for them (no glyph to 
sell).


Nevertheless the whole range of traditional space forms was admitted, despite 
they were going to be of limited usability. And they were given properties.
Or can’t the misdefinition of PUNCTUATION SPACE be backtracked to that era?

Adobe's publishing software were then completely proprietary (jsut like Microsoft and others like 
Lotus, WordPerfect...). Years ago I was working for the French press, and they absolutely required 
us to manage the [FINE] for use in newspapers, classified ads, articles, guides, phone books, 
dictionnaries. It was even mandatory to enter these [FINE] in the composed text and they trained 
their typists or ads sellers to use it (that character was not "sold" in classified ads, 
it was necessary for correct layout, notably in narrow columns, not using it confused the readers 
(notably for the ":" colon): it had to be non-breaking, non-expanding by justification, 
narrower than digits and even narrower than standard non-justified whitespace, and was consistently 
used as a decimal grouping separator.


No doubt they were confident that when an UCS is set up, such an important 
character wouldn’t be skipped.
So confident that they never guessed that they had a key role in reviewing, in 
providing feedback, in lobbying.
Too bad that we’re still so few people today, corporate vetters included, 
despite much things are still going wrong.


But at that time the most common OSes did not support it natively because there 
was no vendor charset supporting it (and in fact most OSes were still unable to 
render proportional fonts everywhere and were frequently limited to 8-bit 
encodings (DOS, Windows, Unix(es), and even Linux at its early start).


Was there a lack of foresightedness?
Turns out that today as those characters are needed, they aren’t ready. Not 
even the NNBSP.

Perhaps it’s the poetic ‘justice of time’ that since Unicode is on, the 
Vietnamese are the foremost, and the French the hindmost.
[I’m alluding to the early lobbying of Vietnam for a comprehensive set of 
precomposed letters, while French wasn’t even granted to come into the benefit 
of the NNBSP – that according to PRI #308 [1] is today the only known use of 
NNBSP outside Mongolian – and a handful ordinal indicators (possibly along with 
the rest of the alphabet, except q).

[1] “The only other widely noted use for U+202F NNBSP is for representation of the 
thin non-breaking space (/espace fine insécable/) regularly seen next to certain 
punctuation marks in French style typography.” 
<http://www.unicode.org/review/pri308/pri308-background.html>

So intermediate solution was needed. Us chose not to use at all the non-breakable thin 
space because in English it was not needed for basic Latin, but also because of the huge 
prevalence of 7-bit ASCII for everything (but including its own national symbol for the 
"$", competing with other ISO 646 variants). There were tons of legacy 
applications developed ince decenials that did not support anything else and 
interoperability in US was available ony with ASCII, everything else was unreliable.


Probably because it wouldn’t have made much sense as long as people are 
unwilling to key in anything more, due to the requirement of maintaining a 
duplicate Alt key.


If you remember the early years when the Internet started to develop outside US, you 
remember the nightmare of non-interoperable 8-bit charsets and the famous 
"mojibake" we saw everywhere.


We can still have mojibake in Windows terminal, at least on Windows 7, and when 
Latin-1 is coded in UTF-8 and rendered while CP1252 is default.

Then the competition between ISO and Unicode lasted too long. But it was considered 
"too late" for French to change anything (and Windows used in so many places by 
som many users promoted the use of the Windows-1252 charset (which had a few updates 
before it was frozen definitely: there was no place for NNBSP in it).

In the wake it could have been relegated to history. What was the plot in 
keeping bothering end-users with an unusable legacy encoding?

Typographers and publishers were upset: to use the NNBSP they still needed to 
use proprietary *document* encodings.

They still needed? Why didn’t they just refuse to buy it? That would have 
changed the vendors’ minds, I guess.

The W3C did not help much too (it was long to finally adopt the UCS as a 
mandatory component for HTML, before that it depended only on the old IANA 
charset database promoting only the work of vendors and a few ISO standards).

The W3C hasn’t even defined a named entity for &nnbsp;, like the have done for 
&zwnj; Who instructed them to obstruct?


France itself wanted to keep its own national variant of ISO 646 (inherited 
from telegraphic systems), but it was finally abandoned: everybody was already 
using windows 1252 or ISO 8859-1 (even early Unix adopters which used a 
preliminary version made by Digital/DEC, then promoted by X11), or otherwise 
used Adobe proprietary encodings. Unix itself had no standard (so many 
different variants including with other OSes for industrial or accounting 
systems, made notably by IBM,, which created so many variants, almost one for 
each submarket, multiple ones in the same country, each time split into 
mutliple variants between those based on ASCII, and those based on EBCDIC...)

Was that the era when the industry wasn’t ready for 16-bit computing? What a 
nightmare, indeed…

But today the problem is that despite that’s all over and passé, part of the 
industry seems to keep bullying the NNBSP as if they didn’t want French and 
other languages to use it right now.


The truth is that publishers were forgotten, because their commercial market 
was much narrower: each publisher then used its own internal conventions. Even 
libaries used their own classifications. There was no attempt to unifify the 
needs for publishers (working at document level) and data processors (including 
OSes). This effort started only very late, when W3C finally started to work 
seriously on fixing HTML, and make it more or less interoperable with SGML 
(promoted by publishers).

Forgetting the publishers is really bad. Now the point is that NNBSP is not 
only relevant to publishers, but to every single end-user trying to write in 
French.

But at national level, there were still lot of other competing standards (let's 
remember teletext, including the Minitel terminal and Antiope for TV). People 
at home did not have access to any system capable of rendering proportionaly 
fonts. All early computers for personal use were based on fixed-width 8-bit 
fonts (including in Japan). China and Korea were still not technology advanced 
as they are today (there were some efforts but they were costly and there was 
little return at that time).

Proportional fonts at home started likely with the Macintosh, IIRC.


The adoption of the UCS was extremely long, and it is still not competely 
finished even if now its support is mandatory in all new computiong standards 
and their revisions. The last segment where it still resists is the mobile 
phone industry (how can the SMS be so restricted and so much non-interoperable, 
and inefficient?)

I thought that is a limitation proper to the type of cellphone I’m using.


So French has a long tradition for its "fine", its support was demanded since long but 
constantly ignored by vendors making "the" standard.

So here we have it. The need for NNBSP was ignored by UTC…

I’m already fearing that UTC instructed CLDR TC to roll back the NNBSP instead 
of completing its implementation.

Not every company has a principled house policy about doing no evil. All my 
suspicions about lawless lobbying and malicious marketing are hereby confirmed.

That’s driving me mad. I need to stop posting to this list, and mind my 
business.

Publishers themselves resisted against the adoption of the web as a publishing platform: they 
prefered their legacy solutions as well, and did not care much about interoperability, so they did 
not pressure enough the standard makers to adopt the "fine". The same happened in US. 
There was no "commercial" incentive to adopt it and littel money coming from that sector 
(that has since suffered a lot from the loss of advertizing revenue, the competition of online 
publishers, the explosion of paper cost, but as well from the huge piracy level made on the 
Internet that reduced their sales and then their effective measured audience; the same is happening 
now on the TV and radio market; and on the Internet the adverizing market has been concentrated a 
lot and its revenues are less and less balanced; photographs and reporters have difficulties now to 
live from their work).

And there's little incentive now for creating quality products: so many products are developed and 
distributed very fast, and not enough people care about quality, or won't pay for it. The old good practives 
of typographs and publishers are most often ignored, they look "exotic" or 
"old-fashioned", and so many people say now these are "not needed" (just like they'll say 
that supporting multiple languages is not necessary)

If the users you’re referring to don’t deserve the right to type in their 
language’s interoperable representation, there’s no hope.

You’re talking about a fringe that is generating part of the information feed 
on social media. The overwhelming majority of end-users are full of good will, 
and are very learned people. Like education is set up against illiteracy, 
fighting in-typography is a matter of training. There’s a mass of fine blogs 
out there. What may remain to do is only adding to it.


Many thanks to Philippe Verdy for this valuable feedback.

Best regards,

Marcel

Re: NNBSP (was: A last missing link for interoperable representation)

Reply via email to