2009/7/29 Nikola Smolenski <smole...@eunet.yu>:
> Дана Tuesday 28 July 2009 19:16:22 Brion Vibber написа:
>> On 7/28/09 10:04 AM, Aryeh Gregor wrote:
>> > On Tue, Jul 28, 2009 at 12:52 PM, Mark Williamson<node...@gmail.com>
> wrote:
>> >> Case insensitivity shouldn't be a problem for any language, as long as
>> >> you do it properly.
>> >>
>> >> Turkish and other languages using dotless i, for example, will need a
>> >> special rule - Turkish lowercase dotted i capitalizes to a capital
>> >> dotted İ while lowercase undotted ı capitalizes to regular undotted I.
>> >
>> > And so what if a wiki is multilingual and you don't know what language
>> > the page name is in?  What if a Turkish wiki contains some English
>> > page names as loan words, for instance?
>>
>> Indeed, good handling of case-insensitive matchings would be a big win
>> for human usability, but it's not easy to get right in all cases.
>>
>> The main problems are:
>>
>> 1) Conflicts when we really do consider something separate, but the case
>> folding rules match them together
>>
>> 2) Language-specific case folding rules in a multilingual environment
>>
>> Turkish I with/without dot and German ß not always matching to SS are
>> the primary examples off the top of my head. Also, some languages tend
>> to drop accent markers in capital form (eg, Spanish). What can or should
>> we do here?
>
> Similar to automatic redirect, we could build an authomatic disambiguation
> page. For example, someone on srwiki going to [[Dj]] would get:
>
> Did you mean:
>
> * [[Đ]]
> * [[DJ]]
> * [[D.J.]]
>
>> A nearer-term help would be to go ahead and implement what we talked
>> about a billion years ago but never got around to -- a decent "did you
>> mean X?" message to display when you go to an empty page but there's
>> something similar nearby.
>
> Was thinking a lot about this. The best solution I thought of would be to add
> a column to page table "page_title_canonical". When an article is
> created/moved, this canonical title is built from the real title. When an
> article is looked up, if there is no match in page_title, build the canonical
> title from the URL and see if there is a match in page_title_canonical and if
> yes, display "did you mean X" or even go there automatically as if from a
> redirect (if there is only one match) or "did you mean *X, *X1" if there are
> multiple matches.
>
> This canonical title would be made like this:
> * Remove disambiguator from the title if it exists
> * Remove punctuation and the like
> * Transliterate the title to Latin alphabet
> * Transliterate to pure ASCII
> * Lowercase
> * Order the words alphabetically
>
> What could possibly go wrong?
>
> Note that this would also be very helpful for non-Latin wikis - people often
> want Latin-only URLs since non-Latin URLs are toooo long. I also recall a
> recent discussion about a wiki in a language with nonstandard spelling (nds?)
> where they use bots to create dozens or even hundreds of redirects to an
> article title - this would also make that unneeded.
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

I actually did make this extension a couple of years, intended for the
English Wiktionary where we manually add an {{also}} template to the
top of pages to like to other pages whose titles differ in minor ways
such as capitalization, hyphenation, apostrophes, accents, periods. I
think I had it working with Hebrew and Arabic and a few other exotic
languages besides.

It was running on Brion's test box for some time but getting little
interest. It's been offline and unmaintained since Brion moved and I
did a couple of overseas trips.

http://www.mediawiki.org/wiki/Extension:DidYouMean
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/DidYouMean/
https://bugzilla.wikimedia.org/show_bug.cgi?id=8648

It hooked all ways to create delete or move a page to maintain a
separate table of normalized page titles which it consulted when
displaying a page.
The code for display was designed for compatibility with the
then-current Wiktionary templates and would need to be implemented in
a more general way.
A core version would probably just add a field to the existing table.

Andrew Dunbar (hippietrail)


-- 
http://wiktionarydev.leuksman.com http://linguaphile.sf.net

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to