Re: WORD JOINER vs ZWNBSP

Marcel Schneider Wed, 01 Jul 2015 01:51:24 -0700

On Tue, Jun 30, 2015, Richard Wordingham  wrote:

> On Tue, 30 Jun 2015 11:25:43 +0200 (CEST)
> Marcel Schneider  wrote:
> 
> > On Mon, Jun 30, 2015, Richard Wordingham  wrote:
> 
> > I tested on Microsoft Word 2010 Starter running on Windows 7 Starter,
> > on a netbook. This software being based on the full versions, the
> > interpretation of U+FEFF must be the standard behavior. I tested in
> > Latin script. You may wish to redo the tests, so please open a new
> > document, input two words, replace the blank with whatever character
> > the word boundaries behavior is to be checked of, and search for one
> > of the two words with the 'whole word' option enabled. If the result
> > is none, the test character indicates the absence of word boundaries;
> > if there is a result, the test character indicates the presence of
> > word boundaries.

Yesterday (On Tue, Jun 30, 2015) already, I wondered how my text could be 
altered with needlessly suppressed and added line breaks.
Now I wish everybody to take notice that, at least on this Public List, I 
*never* quoted anybody this way:

> At some time in June 2015, Richard Wordingham wrote:

This is why, to get started with this reply, I replaced that line with the 
accurate one, which can be checked at 
http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0279.html (except the 
e-mail address, which is suppressed by the list engine at archiving, and will 
be so here again):

On Tue, Jun 30, 2015, Richard Wordingham  wrote:
_______

> I did my own tests in word 2010 with Windows 7. Although U+FEFF and
> U+2060 displayed differently when I enabled the display of
> 'non-printing' characters (spaces, inactive soft hyphens, non-breaking
> hyphens, paragraph ends etc.), the behaved the same when embedded in
> French l'eau and Thai กก - they changed each word to two words, as
> detected by ctrl/rt-arrow. However, this is wrong. 

At the same time, Doug Ewell (to whom I'll reply soon, as well as to Khaled 
Hosny) was writing exactly what I see at display: a .notdef box. Personally 
I've enabled for current display: paragraph ends, manual line breaks, 
tabulation characters, text limits. (Unfortunately I cannot enable separately 
the display of style separators too. To see them, I must enable all, as Richard 
did for test.)

Ctrl + RIGHT overrides APOSTROPHEs and in-word single closing-quotes, and can 
therefore not be used to detect word boundaries. 
Perhaps you might consider to run the test as I did. It goes as follows:

1 Open a new document.
2 input two words with a blank between.
3 Replace the blank with whatever character the word boundaries behavior is to 
be checked of.
4 Do a search for one of the two words with the 'whole word' option enabled.
→ If the result is 'No instance found', the test character indicates the 
absence of word boundaries.
→ If the result is 'One instance found', the test character indicates the 
presence of word boundaries.

This way, you will be told by Microsoft Word that the word 'eau' is found, 
because you used U+0027. Same result with U+2019. It wouldn't be until you use 
U+02BC, that U+006C U+02BC U+0065 U+0061 U+0075 is considered as a single word. 
With U+006C U+02BC U+FEFF U+0065 U+0061 U+0075, you will find the word 'eau' 
again. This is not wrong, given that a word joiner is expected to join words, 
in order that no NBSP nor any other no-break white space is needed to prevent 
line breaks between them. However, the words remain words. This is why Ctrl + 
RIGHT makes a stop at U+FEFF, detecting a word boundary. The overriding of 
in-word punctuations by quick cursor move is for word processing convenience 
only, in English as well as in French and other languages. In your example, 
when 'l'eau' (the water) is to be replaced with its counter-part 'la terre' 
(the land), when placing the cursor at the end and pressing Ctrl + BACKSPACE, 
you get the two words deleted and can immediately rewrite the non-elided 
article and the new word. But, as I say, that is not a test for word boundaries.

> >> No, this doesn't work.
> 
> Clarification: It doesn't work in correct software. Correct software
> would have treated the modified words as single words.

As far as belongs to the French example, the elided article and the noun are 
*already* treated as two words in correct software. There are spell-checkers 
which don't recognize a word when it is preceded by an elided article with 
apostrophe, but these are *not* correct software. And they are *not* from 
Microsoft. About Thai I've no knowledge, but I guess that กก is a correct word, 
and therefore, correct software will take notice of the U+FEFF or U+2060 you 
add between the two characters and therefore assume that you mean *two* words 
but that you just won't have any blank between them. This is not wrong, again, 
and it is consistent with the fact that correct software complies to the 
Standards, that the Standards are designed to be useful, and that correct 
software is useful software. 

Talking about software, what use else of being correct?

Marcel 

> Message du 30/06/15 23:40
> De : "Richard Wordingham" 
> A : "Unicode Mailing List" 
> Copie à : 
> Objet : Re: WORD JOINER vs ZWNBSP
> 
> On Tue, 30 Jun 2015 11:25:43 +0200 (CEST)
> Marcel Schneider  wrote:
> 
> > At some time in June 2015, Richard Wordingham wrote:
> 
> > I tested on Microsoft Word 2010 Starter running on Windows 7 Starter,
> > on a netbook. This software being based on the full versions, the
> > interpretation of U+FEFF must be the standard behavior. I tested in
> > Latin script. You may wish to redo the tests, so please open a new
> > document, input two words, replace the blank with whatever character
> > the word boundaries behavior is to be checked of, and search for one
> > of the two words with the 'whole word' option enabled. If the result
> > is none, the test character indicates the absence of word boundaries;
> > if there is a result, the test character indicates the presence of
> > word boundaries.
> 
> I did my own tests in word 2010 with Windows 7. Although U+FEFF and
> U+2060 displayed differently when I enabled the display of
> 'non-printing' characters (spaces, inactive soft hyphens, non-breaking
> hyphens, paragraph ends etc.), the behaved the same when embedded in
> French l'eau and Thai กก - they changed each word to two words, as
> detected by ctrl/rt-arrow. However, this is wrong. 
> 
> 
> >> No, this doesn't work.
> 
> Clarification: It doesn't work in correct software. Correct software
> would have treated the modified words as single words.
> 
> Richard.
> 
>

Re: WORD JOINER vs ZWNBSP

Reply via email to