2009/7/29 Nikola Smolenski <smole...@eunet.yu>: > Дана Tuesday 28 July 2009 19:16:22 Brion Vibber написа: >> On 7/28/09 10:04 AM, Aryeh Gregor wrote: >> > On Tue, Jul 28, 2009 at 12:52 PM, Mark Williamson<node...@gmail.com> > wrote: >> >> Case insensitivity shouldn't be a problem for any language, as long as >> >> you do it properly. >> >> >> >> Turkish and other languages using dotless i, for example, will need a >> >> special rule - Turkish lowercase dotted i capitalizes to a capital >> >> dotted İ while lowercase undotted ı capitalizes to regular undotted I. >> > >> > And so what if a wiki is multilingual and you don't know what language >> > the page name is in? What if a Turkish wiki contains some English >> > page names as loan words, for instance? >> >> Indeed, good handling of case-insensitive matchings would be a big win >> for human usability, but it's not easy to get right in all cases. >> >> The main problems are: >> >> 1) Conflicts when we really do consider something separate, but the case >> folding rules match them together >> >> 2) Language-specific case folding rules in a multilingual environment >> >> Turkish I with/without dot and German ß not always matching to SS are >> the primary examples off the top of my head. Also, some languages tend >> to drop accent markers in capital form (eg, Spanish). What can or should >> we do here? > > Similar to automatic redirect, we could build an authomatic disambiguation > page. For example, someone on srwiki going to [[Dj]] would get: > > Did you mean: > > * [[Đ]] > * [[DJ]] > * [[D.J.]] > >> A nearer-term help would be to go ahead and implement what we talked >> about a billion years ago but never got around to -- a decent "did you >> mean X?" message to display when you go to an empty page but there's >> something similar nearby. > > Was thinking a lot about this. The best solution I thought of would be to add > a column to page table "page_title_canonical". When an article is > created/moved, this canonical title is built from the real title. When an > article is looked up, if there is no match in page_title, build the canonical > title from the URL and see if there is a match in page_title_canonical and if > yes, display "did you mean X" or even go there automatically as if from a > redirect (if there is only one match) or "did you mean *X, *X1" if there are > multiple matches. > > This canonical title would be made like this: > * Remove disambiguator from the title if it exists > * Remove punctuation and the like > * Transliterate the title to Latin alphabet > * Transliterate to pure ASCII > * Lowercase > * Order the words alphabetically > > What could possibly go wrong? > > Note that this would also be very helpful for non-Latin wikis - people often > want Latin-only URLs since non-Latin URLs are toooo long. I also recall a > recent discussion about a wiki in a language with nonstandard spelling (nds?) > where they use bots to create dozens or even hundreds of redirects to an > article title - this would also make that unneeded. > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I actually did make this extension a couple of years, intended for the English Wiktionary where we manually add an {{also}} template to the top of pages to like to other pages whose titles differ in minor ways such as capitalization, hyphenation, apostrophes, accents, periods. I think I had it working with Hebrew and Arabic and a few other exotic languages besides. It was running on Brion's test box for some time but getting little interest. It's been offline and unmaintained since Brion moved and I did a couple of overseas trips. http://www.mediawiki.org/wiki/Extension:DidYouMean http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/DidYouMean/ https://bugzilla.wikimedia.org/show_bug.cgi?id=8648 It hooked all ways to create delete or move a page to maintain a separate table of normalized page titles which it consulted when displaying a page. The code for display was designed for compatibility with the then-current Wiktionary templates and would need to be implemented in a more general way. A core version would probably just add a field to the existing table. Andrew Dunbar (hippietrail) -- http://wiktionarydev.leuksman.com http://linguaphile.sf.net _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l