Hi, everyone, I want to do some experiments on classification using web pages of wikipedia. Now that I have got the web page archive, the experiment needs the following category information:
1. what is the category (or categories) of a web page (an article)? eg. once I can get the two tips, the information is enough. a. Web page P1 belongs to category C1; b. Category C1 is under two parent categories CC1 and CC2, while the two categories own their parent category chains seperately. Then I can build a tree, which leaves are the web pages. 2. how do guys in wikipedia deal with the category work upon the huge amount of articles, for example, category method, level or inheritance between categories. Could you give me some adivces or URLs to find them ? Thanks & Best wishes, -- Yang Jie(杨杰) hi.baidu.com/thinkdifferent Group of CLOUD, Xi'an Jiaotong University Department of Computer Science and Technology, Xi’an Jiaotong University PHONE: 86 1346888 3723 TEL: 86 29 82665263 EXT. 608 MSN: xtyangjie2...@yahoo.com.cn once i didn't know software is not free, but found it days later; now i realize that it's indeed free. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l