From: "Pim Blokland" <[EMAIL PROTECTED]>
> However, a couple of paragraphs up, the definition for No-Break
> Space says:
>
> > U+00A0 [No-Break Space] behaves like the following coded
> > character sequence: U+FEFF [Zero Width No-Break Space] +
> > U+0020 [Space] + U+FEFF [Zero Width No-Break Space].
>
> Is this something that has slipped by the editors? Or am I missing
> something?

The main word of the sentence is "behave like". That's different from saying
it is equivalent (no the statement does not say that NBSP is decomposable,
but it just illustrates the non-breaking behavior of NBSP, on both sides,
and is to be represented as if it was a normal space).

But it's true that NBSP is used to join words, and so a better analogy would
to say:

> U+00A0 [No-Break Space] behaves like the following coded
> character sequence: U+2060 [Word Joiner] +
> U+0020 [Space] + U+2060 [Word Joiner].

I think that the wording of this sentence was not modified as it should have
been. But this does not constitutes a breach in the standard, as the
sentence is mostly informative.

Of course, coding a text with <ZWNBSP,SP,ZWNBSP> instead of just <NBSP>
would create possible collisions with current BOM. But it is not invalid to
use the 3 character sequence in the middle of the text. For UTF encoding
schemes that forbid the use of BOM, ZWNBSP (U+FEFF) must be still
interpreted exactly like the newer WORD JOINER.
There will be no problem with BOM interpretation if a text uses instead
<WJ,SP,WJ> even at the begining of text, which is equally valid (even if a
WJ at the first position of text looks strange).

But there's an opportunity now to use indenting spaces at the begining of
lines, which may be rendered in paragraphs by keeping the spacing, if the
first WJ is removed from the sequence, and successive WJ are collated into a
single one:
<SP,WJ,SP,WJ,SP,WJ> would then be encoding _roughly_ (not equivalently...)
the same rendered text as:
<ZWNJ,NBSP,NBSP,NBSP>


Reply via email to