In my opinion we should try to first process the whole linked phrase by inflection aka affix rules, and if that fails aka no link target can be found – then and only then should regexps form prefix and linktrails be applied. If applying prefix or linktrails creates a word that can be inflected, and it links to the same target, then move the strings into the linked phrase. If the link use the pipe-form, then move the strings into the second part of the link, aka the link text.
Links using the pipe-form should not have the link target inflected. This is important, as this is the natural escape route if inflection gives wrong target for whatever reason. Inflected links should go to the target with the smallest difference. This is a non-trivial problem. We often link _phrases_ and those could be processed by several rules, each with some kind of weight rules. An edit distance would probably not be sufficient. Perhaps most important; VisualEditor should not insert <nowiki/>, if the users needs this escape route then let them do it themselves in WikitextEditor. On Fri, Oct 5, 2018 at 6:17 PM Amir E. Aharoni <amir.ahar...@mail.huji.ac.il> wrote: > בתאריך יום ו׳, 5 באוק׳ 2018 ב-16:59 מאת Dan Garry < > dga...@wikimedia.org > >: > > > > On Thu, 4 Oct 2018 at 23:29, John Erling Blad <jeb...@gmail.com> wrote: > > > > > Usually it comes from user errors while using VE. This kind of errors > are > > > quite common, and I asked (several years ago) whether it could be fixed > in > > > VE, but was told "no". > > > > > > > I'd really appreciate it if you could give me more information on this. > > This is very frequent. I know that in the Hebrew Wikipedia it happens up to > 20 times a day (I actually counted this for many months), and this is never > intentional or desirable. Never, ever. 100% of cases. The same must be true > for many other languages, but probably not for all. In wikis bigger than > the Hebrew Wikipedia it probably happens much more often than 20 times a > day. > > It is possibly the most frequent reason for automatic insertion of <nowiki> > tags (although this may be different by language). > > How does it happen? Several ways: > * People add a word ending to an existing link. English has very few word > endings (-s, -ing, -ed, -able, and not much more), but many other languages > have more. > * People highlight only a part of a word when they add a link, even though > they should have highlighted the whole word. > * In particular, people highlight the part of the word without an ending. > For example, "Dogs" is written, and people highlight "Dog". > * People sometimes actually want to write two separate words and forget to > write a space. (This may sound silly, but I saw this happening very often.) > * People write a compound word and link a part of the word. Sometimes it's > intentional, although as we can see in other emails in this thread not > everybody agrees about the desirability of this. This works very > differently in different languages. German has a lot of them, English has > much less, Hebrew has almost zero. > > It's worth running proper user testing > > > Here's how the linking feature works right now for adding links to words > > which presently have no links: > > > > - If you put your cursor inside a word without highlighting anything, > > and add a link, the link is added to the entire word. > > - If you highlight some text, and add a link, the link is added to the > > highlighted text. > > I know this, and I like how it works, but the fact is that there are many > other users who don't know this. Simply searching wikitext for > "]]<nowiki/>" will show how often does this happen. > > > How would you propose this feature be changed? > > One possibility is to not add <nowiki/> after a link. I proposed it, but it > was declined: https://phabricator.wikimedia.org/T141689 . The declining > comment links to T128060, which you mentioned in your email, and it's still > not resolved. > > Other than fully stopping to do it, I cannot think of many other > possibilities. Maybe we could show a warning, although I suspect that many > users will ignore it or find it unnecessarily intrusive. I'm not a real > designer, and it's possible that a real designer can come with something > better. > > Another thing we could consider is to link the whole word *by default*, and > to add another function that separates a link from the trail. I'd further > suggest the separation be done internally not by "<nowiki/>", but by some > other syntax that looks more semantic, for example "{{#sep}}" (this should > be a magic word and not a template!). My educated guess is that separating > the word from the link is much less frequent than wanting to link the whole > word. Part of my motivation for starting this thread was to understand how > does this work in different languages. > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l