Hi, everyone,

I want to do some experiments on classification using web pages of
wikipedia. Now that I have got the web page archive, the experiment
needs the following category information:

1. what is the category (or categories) of a web page (an article)?
eg. once I can get the two tips, the information is enough.
    a. Web page P1 belongs to category C1;
    b. Category C1 is under two parent categories CC1 and CC2, while
the two categories own their parent category chains seperately.
Then I can build a tree, which leaves are the web pages.

2. how do guys in wikipedia deal with the category work upon the huge
amount of articles, for example, category method, level or inheritance
between categories.

Could you give me some adivces or URLs to find them ?

Thanks & Best wishes,

-- 
Yang Jie(杨杰)
hi.baidu.com/thinkdifferent

Group of CLOUD, Xi'an Jiaotong University
Department of Computer Science and Technology, Xi’an Jiaotong University

PHONE: 86 1346888 3723
TEL: 86 29 82665263 EXT. 608
MSN: xtyangjie2...@yahoo.com.cn

once i didn't know software is not free, but found it days later; now
i realize that it's indeed free.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to