Re: [Wikitech-l] WMF XML dump title case problem

2011-06-27 Thread Platonides
Emmanuel Engelhart wrote:

 Hi

 Titles should be stored in the table page with a first letter uppercased.
 http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_%28technical_restrictions%29#Lower_case_first_letter

 Unfortunately, it seems that we have XML dumps (and consequently
 mwdumper generated SQL) containing titles with a first letter lowercased.

 For example:
 $wget
 http://download.wikimedia.org/mywiktionary/20110617/mywiktionary-20110617-pages-articles.xml.bz2
 $bzip2 -d -c mywiktionary-20110617-pages-articles.xml.bz2 | grep
 title| grep tationery | more
 titlestationery/title
 titlestationery shop/title

 Is that a bug?

No. Those titles are fully case sensitive. Look at the top of the file:
  casecase-sensitive/case

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] WMF XML dump title case problem

2011-06-26 Thread Emmanuel Engelhart
Sorry, now correctly cross posted.
Emmanuel

 Original Message 
Subject:WMF XML dump title case problem
Date:   Sun, 26 Jun 2011 17:07:19 +0200
From:   Emmanuel Engelhart emman...@engelhart.org
To: Mailing list for Wikimedia CH wikimediac...@lists.wikimedia.org, 
offlin...@lists.wikimedia.org



Hi

Titles should be stored in the table page with a first letter uppercased.
http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_%28technical_restrictions%29#Lower_case_first_letter

Unfortunately, it seems that we have XML dumps (and consequently
mwdumper generated SQL) containing titles with a first letter lowercased.

For example:
$wget
http://download.wikimedia.org/mywiktionary/20110617/mywiktionary-20110617-pages-articles.xml.bz2
$bzip2 -d -c mywiktionary-20110617-pages-articles.xml.bz2 | grep
title| grep tationery | more
titlestationery/title
titlestationery shop/title

Is that a bug?

Regards
Emmanuel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] WMF XML dump title case problem

2011-06-26 Thread MZMcBride
Emmanuel Engelhart wrote:
 Titles should be stored in the table page with a first letter uppercased.
 http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_%28technical_restric
 tions%29#Lower_case_first_letter
 
 Unfortunately, it seems that we have XML dumps (and consequently
 mwdumper generated SQL) containing titles with a first letter lowercased.
 
 For example:
 $wget
 http://download.wikimedia.org/mywiktionary/20110617/mywiktionary-20110617-page
 s-articles.xml.bz2
 $bzip2 -d -c mywiktionary-20110617-pages-articles.xml.bz2 | grep
 title| grep tationery | more
 titlestationery/title
 titlestationery shop/title
 
 Is that a bug?

No.

You're trying to apply the English Wikipedia's rules to the Burmese
Wiktionary. Wiktionaries have $wgCapitalLinks set to false.[1]

MZMcBride

[1] http://www.mediawiki.org/wiki/Manual:$wgCapitalLinks



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] WMF XML dump title case problem

2011-06-26 Thread Steve Summit
Emmanuel Engelhart wrote:
 Titles should be stored in the table page with a first letter uppercased...
 Unfortunately, it seems that we have XML dumps (and consequently
 mwdumper generated SQL) containing titles with a first letter lowercased.
 For example:
 $wget
 http://download.wikimedia.org/mywiktionary/20110617/mywiktionary-20110617-pages-articles.xml.bz2

Wiktionary is different.  Its users requested reconfiguration so
that words are stored in the database with their exact capitalization.
The Wikipedia-style first-letter capitalization (which caused
pretty severe problems for a dictionary) is *not* performed there.

See also http://en.wiktionary.org/wiki/Wiktionary:Capitalization .

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l