This is super cool, thanks for sharing!  Would you mind if I write it up
for the Wikidata Query Service docs?

On Mon, Apr 20, 2015 at 3:50 PM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> On 20.04.2015 23:47, Daniel Kinzler wrote:
>
>> Something seems to be wrong with the order, though. Munich (pop > 1m in
>> all
>> statements) is listed way after Chemnitz (pop < 300k in all statements).
>> Any
>> idea why?
>>
>
> Good catch. My query was too simple (using one "random" population instead
> of the biggest one). Here is a better query, this time even with
> populations given:
>
> PREFIX : <http://www.wikidata.org/entity/>
> SELECT ?city (MAX(?population) AS ?max_population)  ?citylabel ?mayorlabel
> WHERE {
>  ?city :P31c/:P279c* :Q515 .  # find instances of subclasses of city
>  ?city :P6s ?statement .      # with a P6 (head of goverment) statement
>  ?statement :P6v ?mayor .     # ... that has the value ?mayor
>  ?mayor :P21c :Q6581072 .     # ... where the ?mayor has P21 (sex or
> gender) female
>  FILTER NOT EXISTS { ?statement :P582q ?x }  # ... but the statement has
> no P582 (end date) qualifier
>
>  # Now select the population value of the ?city
>  # (the number is reached through a chain of three properties)
>  ?city :P1082s/:P1082v/<http://www.wikidata.org/ontology#numericValue>
> ?population .
>
>  # Optionally, find English labels for city and mayor:
>  OPTIONAL {
>    ?city rdfs:label ?citylabel .
>    FILTER ( LANG(?citylabel) = "en" )
>  }
>  OPTIONAL {
>    ?mayor rdfs:label ?mayorlabel .
>    FILTER ( LANG(?mayorlabel) = "en" )
>  }
> } GROUP BY ?city ?citylabel ?mayorlabel
> ORDER BY DESC(?max_population) LIMIT 100
>
>
>> Oh... maybe quantity values are sorted in alphanumeric order, because
>> they are
>> decimal strings? They should be xsd:decimal...
>>
>
> They are.
>
> Markus
>
>
>
>> Am 20.04.2015 um 22:18 schrieb Markus Krötzsch:
>>
>>> Hi all,
>>>
>>> For many years, Denny and I have been giving talks about why we need to
>>> improve
>>> the data management in Wikipedia. To explain and motivate this, we have
>>> often
>>> asked the simple question: "What are the world's largest cities with a
>>> female
>>> mayor?" The information to answer this is clearly in Wikipedia, but it
>>> would be
>>> painfully hard to get the result by reading articles.
>>>
>>> I recently had the occasion of actually phrasing this in SPARQL, so that
>>> an
>>> answer can now, finally, be given. The query to run at
>>>
>>> http://milenio.dcc.uchile.cl/sparql
>>>
>>> is as follows (with some explaining comments inline):
>>>
>>> PREFIX : <http://www.wikidata.org/entity/> SELECT DISTINCT ?city
>>> ?citylabel
>>> ?mayorlabel WHERE {
>>>   ?city :P31c/:P279c* :Q515 .  # find instances of subclasses of city
>>>   ?city :P6s ?statement .      # with a P6 (head of goverment) statement
>>>   ?statement :P6v ?mayor .     # ... that has the value ?mayor
>>>   ?mayor :P21c :Q6581072 .     # ... where the ?mayor has P21 (sex or
>>> gender) female
>>>   FILTER NOT EXISTS { ?statement :P582q ?x }  # ... but the statement
>>> has no P582
>>> (end date) qualifier
>>>
>>>   # Now select the population value of the ?city
>>>   # (the number is reached through a chain of three properties)
>>>   ?city :P1082s/:P1082v/<http://www.wikidata.org/ontology#numericValue>
>>> ?population .
>>>
>>>   # Optionally, find English labels for city and mayor:
>>>   OPTIONAL {
>>>     ?city rdfs:label ?citylabel .
>>>     FILTER ( LANG(?citylabel) = "en" )
>>>   }
>>>   OPTIONAL {
>>>     ?mayor rdfs:label ?mayorlabel .
>>>     FILTER ( LANG(?mayorlabel) = "en" )
>>>   }
>>> } ORDER BY DESC(?population) LIMIT 100
>>>
>>> To see the results, just paste this into the box at
>>> http://milenio.dcc.uchile.cl/sparql and press "Run query".
>>>
>>> The query does not filter the most recent population but relies on
>>> Virtuoso to
>>> pick the biggest value for DESC sorting, and on the world to have
>>> (mostly)
>>> cities with increasing population numbers over time. This is also the
>>> reason why
>>> the population is not printed (it would give you more than one match per
>>> city
>>> then, even with DISTINCT). Picking the current population will become
>>> easier
>>> once ranks are used more widely to mark it.
>>>
>>> There might also be some inaccuracies in cases where a past mayor does
>>> not have
>>> an "end date" set in Wikidata (Madrid has a suspiciously large number of
>>> current
>>> mayors ...), but a query can only ever be as good as its input data.
>>>
>>> I hope this is inspiring to some of you. One could also look for the
>>> world's
>>> youngest or oldest current mayors with similar queries, for example.
>>>
>>> Cheers,
>>>
>>> Markus
>>>
>>>
>>> _______________________________________________
>>> Wikidata-l mailing list
>>> Wikidata-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>>
>>
>>
>>
>
> _______________________________________________
> Wikidata-l mailing list
> Wikidata-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to