> sitelinks / I want to use the data to help rank possible text entity links to Wikidata items
side note: I am helping the https://www.naturalearthdata.com/ project by adding wikidata concordances. it is a public domain geo-database ... with [ mountains, rivers, populated places, .. ] I am using wikidata json dumps - and I am importing to PostGIS database. And I am ranking the matches with - distance, ( lower is better ) - text similarity ( I am checking the "labels" and the "aliases" ) - and sitelinks! And I am lowering the "mostly imported sitelinks" ranks ("cebwiki" , ... ) why? : https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2017/08#Nonsense_imported_from_Geonames Because a lot of geodata re-imported. And the "distance" and "text/labels" are the same. So be careful with the imported Wikipedia pages! ( sitelinks ) Now: As I see the geodata quality is so much better - mostly: where the active wikidata community is cleaning .. it is just an example of why the simple "sitelinks" number is not enough :-) on the other hand: probably the P625 coordinate location is also important. https://www.wikidata.org/wiki/Property:P625 In Germany - the "dewiki" is higher ranks. in Hungary - the "huwiki" is prefered. Kind Regards, Imre <fi...@umbc.edu> ezt írta (időpont: 2022. márc. 22., K, 22:25): > Is there a simple way to get the sitelinks count data for all Wikidata > items? I want to use the data to help rank possible text entity links to > Wikidata items > > I'm really only interested in counts for items that have at least one > (e.g., wikibase:sitelinks value that's >0). According to statistics I've > seen, only about 1/3 of Wikidata items have at least one sitelink. > > I'm not sure if wikibase:sitelinks is included in the standard WIkidata > dump. I could try a SPARQL query with an OFFSET and LIMIT, but I doubt > that the approach would work to completion. > _______________________________________________ > Wikidata mailing list -- wikidata@lists.wikimedia.org > To unsubscribe send an email to wikidata-le...@lists.wikimedia.org >
_______________________________________________ Wikidata mailing list -- wikidata@lists.wikimedia.org To unsubscribe send an email to wikidata-le...@lists.wikimedia.org