Re: [Wikitech-l] WMF XML dump title case problem
Emmanuel Engelhart wrote: Hi Titles should be stored in the table page with a first letter uppercased. http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_%28technical_restrictions%29#Lower_case_first_letter Unfortunately, it seems that we have XML dumps (and consequently mwdumper generated SQL) containing titles with a first letter lowercased. For example: $wget http://download.wikimedia.org/mywiktionary/20110617/mywiktionary-20110617-pages-articles.xml.bz2 $bzip2 -d -c mywiktionary-20110617-pages-articles.xml.bz2 | grep title| grep tationery | more titlestationery/title titlestationery shop/title Is that a bug? No. Those titles are fully case sensitive. Look at the top of the file: casecase-sensitive/case ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] WMF XML dump title case problem
Sorry, now correctly cross posted. Emmanuel Original Message Subject:WMF XML dump title case problem Date: Sun, 26 Jun 2011 17:07:19 +0200 From: Emmanuel Engelhart emman...@engelhart.org To: Mailing list for Wikimedia CH wikimediac...@lists.wikimedia.org, offlin...@lists.wikimedia.org Hi Titles should be stored in the table page with a first letter uppercased. http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_%28technical_restrictions%29#Lower_case_first_letter Unfortunately, it seems that we have XML dumps (and consequently mwdumper generated SQL) containing titles with a first letter lowercased. For example: $wget http://download.wikimedia.org/mywiktionary/20110617/mywiktionary-20110617-pages-articles.xml.bz2 $bzip2 -d -c mywiktionary-20110617-pages-articles.xml.bz2 | grep title| grep tationery | more titlestationery/title titlestationery shop/title Is that a bug? Regards Emmanuel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] WMF XML dump title case problem
Emmanuel Engelhart wrote: Titles should be stored in the table page with a first letter uppercased. http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_%28technical_restric tions%29#Lower_case_first_letter Unfortunately, it seems that we have XML dumps (and consequently mwdumper generated SQL) containing titles with a first letter lowercased. For example: $wget http://download.wikimedia.org/mywiktionary/20110617/mywiktionary-20110617-page s-articles.xml.bz2 $bzip2 -d -c mywiktionary-20110617-pages-articles.xml.bz2 | grep title| grep tationery | more titlestationery/title titlestationery shop/title Is that a bug? No. You're trying to apply the English Wikipedia's rules to the Burmese Wiktionary. Wiktionaries have $wgCapitalLinks set to false.[1] MZMcBride [1] http://www.mediawiki.org/wiki/Manual:$wgCapitalLinks ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] WMF XML dump title case problem
Emmanuel Engelhart wrote: Titles should be stored in the table page with a first letter uppercased... Unfortunately, it seems that we have XML dumps (and consequently mwdumper generated SQL) containing titles with a first letter lowercased. For example: $wget http://download.wikimedia.org/mywiktionary/20110617/mywiktionary-20110617-pages-articles.xml.bz2 Wiktionary is different. Its users requested reconfiguration so that words are stored in the database with their exact capitalization. The Wikipedia-style first-letter capitalization (which caused pretty severe problems for a dictionary) is *not* performed there. See also http://en.wiktionary.org/wiki/Wiktionary:Capitalization . ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l