This message contained a screenshot and originally contained several attached 
screenshots, which prevented it from being forwarded to the List. I removed all 
and suggest that for screenshots, readers might refer to the links I added in 
my e-mail I resent today to Khaled Hosny.


 

On Tue, Jun 30, 2015, Doug Ewell  wrote:

> Khaled Hosny wrote:
> 
> >> On my netbook, which is running Windows 7 Starter, U+2060 is not a
> >> part of any of the shipped fonts.
> >
> > It is a control character, it does not need to have a glyph in the
> > font to be properly supported.

Thank you Khaled, I will respond soon after this.

> The problem is the word "supported." Marcel is seeing a visible glyph (a
> .notdef box) for what is supposed to be an invisible, zero-width
> character, and that is leading him to conclude that Windows doesn't
> "support" this character.

The .notdef box is exactly what I see sometimes on the Notepad and every time 
in the Word dialogs when I use U+2060, but in fact, what I see in the document 
is a particular glyph, representing a tall fullheight empty box with a wide 
space to its right despite of the font being proportional, and in the Notepad 
text the same box but without space. Only when I switch the font to the one you 
indicate below, the word joiner displays correctly on my version of Microsoft 
Word. Please see the attached screenshots (I wanted to paste them into this 
e-mail).

> On my Win 7 machine at work, when I enter the string "one⁠two"
> ("one\u2060two") and click on either word, both words are selected. That
> is exactly what I would expect WJ to do. This works on the built-in
> Notepad as well as Notepad++ and BabelPad (but not on GoDaddy's
> Web-based email client).

The selection with double-click corresponds to what Richard did with the quick 
cursor move. These phenomena are text processing features which give little 
evidence on the presence or the absence of word boundaries. So I redid your 
test but used the search tool, with the "Whole words only" option enabled. This 
gives an idea of how the application percieves the words as entities, or better 
said, how developers expect users to expect search results. Well that isn't 
really a better expression... What I want to say is that what we see is 
normally what we are expected to expect. Personally I wouldn't like to get 
selected only a part of the compound I want most probably to mark up as a 
whole, nor do you, Doug. This is why a double-click on no matter which spot on 
the sequence makes this sequence selected as a whole. By contrast, given that 
we took care to insert word joiners where normally we aren't expected to 
(because it is sufficient to simply type the words one after each other without 
anything between, to get them as *one* word), the software engineers expect us 
to wish to join what must remain a sequence of separate words. Consequently, 
the built-in search engine will recognize each word as a word for itself.

This is where good software deploys its benefits. Some software does not 
recognize the ZWNBSP or the NBSP (I don't know which one or both) as indicating 
the presence of a word boundary, and therefore does not work correctly. That 
depends also on the PDF conversion tool. Please check the screenshots (I 
switched the UIs to English wherever possible, that is, on LibreOffice). [This 
e-mail has been blocked because it contained several attached screenshots. So I 
resend it without attached images.]

> But out of more than 500 fonts on that machine, the only stock Microsoft
> fonts that show WJ with zero-width, instead of a .notdef glyph, are
> Javanese Text, Myanmar Text, and Segoe UI Symbol. So while it's
> inaccurate to extrapolate this to "Microsoft doesn't support WJ," the
> font support is definitely lacking.

I wish to thank you personally Doug, for this very valuable hint. Effectively, 
on Microsoft Word 2010 Starter on Widows 7 Starter, the WJ is not correctly 
displayed unless the font is switched to Segoe UI Symbol (which is the one out 
of the three that had been shipped with my OS). If the Segoe typeface is not 
appropriate in the document, we can ask Word to find and replace all istances 
of U+2060 with the same formatted in Segoe UI Symbol. This may be what Word 
users are expected to do every time. Even if that isn't really what we expect 
of a Productivity Suite. Perhaps, or most probably, this problem does not occur 
in other high-end software, as Microsoft Publisher (needs to be confirmed). But 
if somebody buys Microsoft Office Premium, or Professional, he should be save 
from that misfunctioning. As should be everybody using Microsoft software, in 
fact.

> The bit about characters being converted to other characters, of course,
> has nothing to do with Windows and everything to do with particular
> applications.

Based on this hint, I did more tests and found out that for a proper conversion 
to plain text, any segment including U+00A0, U+FEFF and other format 
characters, when copied from a document on Microsoft Word, must first be pasted 
into a LibreOffice document, then copied again and finally pasted into the text 
editor. I should avoid to vent further about that issue, and I'd better wait 
for official comments; I simply suppose that there is an algorithm (say, then, 
as a part of Microsoft Word) detecting where the clipboard item goes to, and 
eventually destroying the format characters. Guess everybody to what use...

Thanks a lot!

Marcel





[originally one pasted screenshot]

  

Reply via email to