On 9 March 2011 16:00, Platonides <platoni...@gmail.com> wrote:
>> Dear Members,
>> I am Ramesh, pursuing my PhD in Monash University, Malaysia. My
>> Research is on blog classification using Wikipedia Categories.
>> As for my experiment, I use 12 main categories of Wikipedia.
>> I want to identify " which particular article belongs to which main 12
>> categories?".
>> So I wrote a program to collect the subcategories of each article and
>> classify based on 12 categories offline.
>> I have downloaded already wiki-dump which consists of around 3 million
>> article titles.
>> My program takes this 3 million article titles and goes to online
>> Wikipedia website and fetch the subcategories.
>
> Why do you need to access the live wikipedia for this?
> Using categorylinks.sql and page.sql you should be able to fetch the
> same data. Probably faster.

I concur. Everything required for this project should be in the dumps.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to