|
On 1/18/2019 2:05 PM, Marcel Schneider
via Unicode wrote:
==> At the time Unicode was first created (and definitely
before that, during the time of non-universal character sets) many
applications existed that used a "typewriter model" and worked by
space fill rather than decimal-point tabulation. From today's perspective that older model is inflexible and not the best approach, but it is impossible to say how long this legacy approach hung on in some places and how much data might exist that relied on certain long-standing behaviors of these space characters. For a good solution, you always need to understand (1) the requirement of your "index" case (French, in this case) (2) how it relates to similar requirements in (all!) other languages / scripts (3) how it relates to actual legacy practice (3a) what will suddenly no longer work if you change the properties on some character (3b) what older data will no longer work if the effective
behavior of newer applications changes It does not, provided that all numbers have thousands separators, even if filling with spaces. It looks nicer because it’s more legible. ==> Right, but remember, we started off encoding a set of
spaces that existed before Unicode (in some other character sets)
and implicitly made the assumption that those were the correct set
(just like we took punctuation from ASCII and similar sources and
only added to it later, when we understood that they were missing
things --- generally always added, generally did not redefine
behavior or shape of existing code points).
==> probably not in the early days. Y
==> much book printing was also done by photomechanically reproducing typescript at that time. Not everybody wanted to pay typesetters and digital typesetting wasn't as advanced. I actually did use a digital phototypesetter of the period a few years before I joined Unicode, so I know. It was more powerful than a typewriter, but not as powerful as TeX or later the Adobe products. For one, you didn't typeset a page, only a column of text, and it
required manual paste-up etc. That is how CLDR works. CLDR data is by definition per-language. Except for inheritance, languages are independent. There are no "French" characters. When you encode characters, at
best, some code points may be script-specific. For punctuation and
spaces not even that may be the case. Therefore, as long as you
try to solve this as if it only was a French problem, you
are not doing proper character encoding.
==> for your proposal to be effective, you need to reach out.
of what PUNCTUATION SPACE should have been since the beginning. ==> I mentioned before that if something is universally
"broken" it can sometimes be resurrected, because even if you
change its behavior retroactively, it will not change something
that ever worked correctly. (But you need to be sure that nobody
repurposed the NNBSP for something useful that is different from
what you intend to use it for, otherwise you can't change anything
about it). If, however, you are merely adding a use for some existing
character that does not affect its properties, that is usually not
as much of a problem - as long as we can have some confidence that
both usages will continue to be possible. Still it is as simple as not skipping PUNCTUATION SPACE when FIGURE SPACE was made non-breakable. Now we ended up with a mutated Mongolian Space that does not work properly for Mongolian, but does for French and other Latin script using languages. It would even more if TUS was blunter, urging all foundries to update their whole catalogue soon. ==> You realize that I'm giving you general advice here, not something utterly specific to NNBSP - I don't have the inputs and background to know whether your approach is feasible or perhaps the best possible? As for PUNCTUATION SPACE - some of the spaces have acquired usage in math (as part of the added math support in Unicode 3.2). We need to be sure that the assumptions about these that may have been made in math typesetting are not invalidated. Not sure offhand whether UTR#25 captures all of that, but if you ever feel like proposing a property change you MUST research that first (with the current maintainers of that UTR or other experts). This is the way Unicode is different from CLDR. A./ |
- RE: NNBSP Shawn Steele via Unicode
- Re: NNBSP Marcel Schneider via Unicode
- RE: NNBSP Shawn Steele via Unicode
- Re: NNBSP Marcel Schneider via Unicode
- RE: NNBSP Shawn Steele via Unicode
- Re: NNBSP Marcel Schneider via Unicode
- Re: NNBSP Asmus Freytag via Unicode
- Re: NNBSP Richard Wordingham via Unicode
- Re: NNBSP Asmus Freytag via Unicode
- Re: NNBSP Marcel Schneider via Unicode
- Re: NNBSP Asmus Freytag via Unicode
- Re: NNBSP Marcel Schneider via Unicode
- Re: NNBSP Asmus Freytag via Unicode
- Re: NNBSP Marcel Schneider via Unicode
- Re: NNBSP James Kass via Unicode
- Re: NNBSP Asmus Freytag via Unicode
- Re: NNBSP Richard Wordingham via Unicode
- Re: NNBSP Marcel Schneider via Unicode
- Re: A last missing link for interoperable r... Julian Bradfield via Unicode
- Re: A last missing link for interoperable r... James Kass via Unicode
- Re: A last missing link for interoperable r... Julian Bradfield via Unicode

