On Sat, 4 Jul 2015 17:02:00 +0200 (CEST) Marcel Schneider <charupd...@orange.fr> wrote:
> On Fri, Jul 03, 2015, Richard Wordingham wrote: > > > On Fri, 3 Jul 2015 17:19:13 +0200 (CEST) > > Marcel Schneider wrote: > I considered not to reply any more in this unfaithful dialogue, where > after bringing up some historic examples to make me think about them, > Richard switches back to present and makes people believe I could > suppose that any country could prefer the use of other means than > what's world standard. I cannot work out what you think I am making people believe you might suppose. I was pointing out that not everyone uses visible word boundaries. I will also note that people are reluctant to type invisible characters if they don't have immediate benefits. > Now lets come to the core: Why on earth > do we need word boundaries for whole word search in Latin script, > while Thai, Burmese and Cambodian scripts Richard mentions as > examples, use implémentations that can find whole words without any > need of "spaces or any other [separating] character"? The Thai and Cambodian implementations are far from perfect, even when applied to the Thai and Cambodian languages. Using a dictionary for the national languages on text of other languages naturally has even worse performance. A quick experiment suggest that for whole word search in Thai, LibreOffice simply ignores any boundaries bwtween Thai word characters. Double click and ctrl/arrow use different rules. It's quite possible that we are misinterpreting the results of whole word searches. One way of implementing whole word search is to do a general search and then check whether the word found is part of a larger word. To do that, one might simply ask whether the characters before and after the string found are permitted in words. One might easily set things up so that by omission U+2060 is not considered part of a word - the code could have been written before U+2060 was assigned and not updated since. Richard.