This is super cool, thanks for sharing! Would you mind if I write it up for the Wikidata Query Service docs?
On Mon, Apr 20, 2015 at 3:50 PM, Markus Krötzsch < mar...@semantic-mediawiki.org> wrote: > On 20.04.2015 23:47, Daniel Kinzler wrote: > >> Something seems to be wrong with the order, though. Munich (pop > 1m in >> all >> statements) is listed way after Chemnitz (pop < 300k in all statements). >> Any >> idea why? >> > > Good catch. My query was too simple (using one "random" population instead > of the biggest one). Here is a better query, this time even with > populations given: > > PREFIX : <http://www.wikidata.org/entity/> > SELECT ?city (MAX(?population) AS ?max_population) ?citylabel ?mayorlabel > WHERE { > ?city :P31c/:P279c* :Q515 . # find instances of subclasses of city > ?city :P6s ?statement . # with a P6 (head of goverment) statement > ?statement :P6v ?mayor . # ... that has the value ?mayor > ?mayor :P21c :Q6581072 . # ... where the ?mayor has P21 (sex or > gender) female > FILTER NOT EXISTS { ?statement :P582q ?x } # ... but the statement has > no P582 (end date) qualifier > > # Now select the population value of the ?city > # (the number is reached through a chain of three properties) > ?city :P1082s/:P1082v/<http://www.wikidata.org/ontology#numericValue> > ?population . > > # Optionally, find English labels for city and mayor: > OPTIONAL { > ?city rdfs:label ?citylabel . > FILTER ( LANG(?citylabel) = "en" ) > } > OPTIONAL { > ?mayor rdfs:label ?mayorlabel . > FILTER ( LANG(?mayorlabel) = "en" ) > } > } GROUP BY ?city ?citylabel ?mayorlabel > ORDER BY DESC(?max_population) LIMIT 100 > > >> Oh... maybe quantity values are sorted in alphanumeric order, because >> they are >> decimal strings? They should be xsd:decimal... >> > > They are. > > Markus > > > >> Am 20.04.2015 um 22:18 schrieb Markus Krötzsch: >> >>> Hi all, >>> >>> For many years, Denny and I have been giving talks about why we need to >>> improve >>> the data management in Wikipedia. To explain and motivate this, we have >>> often >>> asked the simple question: "What are the world's largest cities with a >>> female >>> mayor?" The information to answer this is clearly in Wikipedia, but it >>> would be >>> painfully hard to get the result by reading articles. >>> >>> I recently had the occasion of actually phrasing this in SPARQL, so that >>> an >>> answer can now, finally, be given. The query to run at >>> >>> http://milenio.dcc.uchile.cl/sparql >>> >>> is as follows (with some explaining comments inline): >>> >>> PREFIX : <http://www.wikidata.org/entity/> SELECT DISTINCT ?city >>> ?citylabel >>> ?mayorlabel WHERE { >>> ?city :P31c/:P279c* :Q515 . # find instances of subclasses of city >>> ?city :P6s ?statement . # with a P6 (head of goverment) statement >>> ?statement :P6v ?mayor . # ... that has the value ?mayor >>> ?mayor :P21c :Q6581072 . # ... where the ?mayor has P21 (sex or >>> gender) female >>> FILTER NOT EXISTS { ?statement :P582q ?x } # ... but the statement >>> has no P582 >>> (end date) qualifier >>> >>> # Now select the population value of the ?city >>> # (the number is reached through a chain of three properties) >>> ?city :P1082s/:P1082v/<http://www.wikidata.org/ontology#numericValue> >>> ?population . >>> >>> # Optionally, find English labels for city and mayor: >>> OPTIONAL { >>> ?city rdfs:label ?citylabel . >>> FILTER ( LANG(?citylabel) = "en" ) >>> } >>> OPTIONAL { >>> ?mayor rdfs:label ?mayorlabel . >>> FILTER ( LANG(?mayorlabel) = "en" ) >>> } >>> } ORDER BY DESC(?population) LIMIT 100 >>> >>> To see the results, just paste this into the box at >>> http://milenio.dcc.uchile.cl/sparql and press "Run query". >>> >>> The query does not filter the most recent population but relies on >>> Virtuoso to >>> pick the biggest value for DESC sorting, and on the world to have >>> (mostly) >>> cities with increasing population numbers over time. This is also the >>> reason why >>> the population is not printed (it would give you more than one match per >>> city >>> then, even with DISTINCT). Picking the current population will become >>> easier >>> once ranks are used more widely to mark it. >>> >>> There might also be some inaccuracies in cases where a past mayor does >>> not have >>> an "end date" set in Wikidata (Madrid has a suspiciously large number of >>> current >>> mayors ...), but a query can only ever be as good as its input data. >>> >>> I hope this is inspiring to some of you. One could also look for the >>> world's >>> youngest or oldest current mayors with similar queries, for example. >>> >>> Cheers, >>> >>> Markus >>> >>> >>> _______________________________________________ >>> Wikidata-l mailing list >>> Wikidata-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l >>> >> >> >> > > _______________________________________________ > Wikidata-l mailing list > Wikidata-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-l >
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l