Am 04.07.2014 07:10, schrieb Rohan Badlani: > I had downloaded the wikidata dump from > http://dumps.wikimedia.org/wikidatawiki/latest/ > There is a file wikidatawiki-20140420-pages-articles-multistream-index which > consists of triplets like: > > 537:114:Q17
I couldn't find documentation for the multistream-index format at <https://meta.wikimedia.org/wiki/Data_dumps>. I can't make sense of it myself offhand. Perhaps ask on the wikitech-l list. I suppose the authority on the question would be Ariel Glenn, perhaps you can get hold of him on IRC. Note that this format is used for all wikis, so it will not contain anything that is specific to Wikidata. It would be the same for Wikipedia. If you figure it out, please add the info to <https://meta.wikimedia.org/wiki/Data_dumps>! > which I interpreted as following: > 537 - category of the topic (which I am unable to find. I want the details of > this item) It's not a category. Wikidata doesn't use MediaWiki's Category feature for data items at all. Wikipedia does, but there pages generally have multiple categories, identified by name, not a numeric ID. If you want to build a classification graph of the concepts in Wikidata (I'm intentionally avoiding the terms "ontology" and "taxonomy" here), you will have to go by the properties P31 (instance of) and P279 (subclass of) which are used in many (roughly half) of the data items. > 114 - page_id of the item Q17. That seems to be correct. > Q17 - which is the item. (JSON: > https://www.wikidata.org/wiki/Special:EntityData/Q17.json) It's the page title, which, on wikidata.org, is the same as the item ID. HTH Daniel PS: we are close to providing JSON dumps on a regular basis, and also make the JSON contained in the XML dumps more readable. This will hopefully make analyzing Wikidata less painful. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l