Hm, good point. It is, in theory, possible, I think – this query
<https://query.wikidata.org/#SELECT%20%3Fitem%20%3FtitleEn%0AWITH%20%7B%0A%20%20SELECT%20%3Fitem%20WHERE%20%7B%0A%20%20%20%20%3Fitem%20wdt%3AP31%20wd%3AQ5%3B%0A%20%20%20%20%20%20%20%20%20%20wdt%3AP106%20wd%3AQ36180%3B%0A%20%20%20%20%20%20%20%20%20%20wdt%3AP21%20wd%3AQ6581097%3B%0A%20%20%20%20%20%20%20%20%20%20wikibase%3Asitelinks%20%3Fsitelinks.%0A%20%20%7D%0A%20%20%23%20ORDER%20BY%20DESC%28%3Fsitelinks%29%0A%20%20LIMIT%2050%0A%7D%20AS%20%25maleAuthors%0AWHERE%20%7B%0A%20%20INCLUDE%20%25maleAuthors.%0A%20%20hint%3ASubQuery%20hint%3Aoptimizer%20%22None%22.%0A%20%20%3Farticle%20schema%3Aabout%20%3Fitem%3B%0A%20%20%20%20%20%20%20%20%20%20%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fen.wikipedia.org%2F%3E%3B%0A%20%20%20%20%20%20%20%20%20%20%20schema%3Aname%20%3FtitleEn.%0A%20%20BIND%28STR%28%3FtitleEn%29%20AS%20%3Ftitle%29%0A%20%20SERVICE%20wikibase%3Amwapi%20%7B%0A%20%20%20%20bd%3AserviceParam%20wikibase%3Aapi%20%22Generator%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20wikibase%3Aendpoint%20%22en.wikipedia.org%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agenerator%20%22allpages%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agapfrom%20%3Ftitle%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agapminsize%20%2210000%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agaplimit%20%221%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20wikibase%3Alimit%201%20.%0A%20%20%20%20%3Fitem_%20wikibase%3AapiOutputItem%20mwapi%3Aitem.%0A%20%20%7D%0A%20%20FILTER%28%3Fitem%20%3D%20%3Fitem_%29%0A%7D%0ALIMIT%2050>
abuses the allpages generator as a generator for exactly one page:

SELECT ?item ?titleEn
WITH {
  SELECT ?item WHERE {
    ?item wdt:P31 wd:Q5;
          wdt:P106 wd:Q36180;
          wdt:P21 wd:Q6581097;
          wikibase:sitelinks ?sitelinks.
  }
  # ORDER BY DESC(?sitelinks)
  LIMIT 50
} AS %maleAuthors
WHERE {
  INCLUDE %maleAuthors.
  hint:SubQuery hint:optimizer "None".
  ?article schema:about ?item;
           schema:isPartOf <https://en.wikipedia.org/>;
           schema:name ?titleEn.
  BIND(STR(?titleEn) AS ?title)
  SERVICE wikibase:mwapi {
    bd:serviceParam wikibase:api "Generator";
                    wikibase:endpoint "en.wikipedia.org";
                    mwapi:generator "allpages";
                    mwapi:gapfrom ?title;
                    mwapi:gapminsize "10000";
                    mwapi:gaplimit "1";
                    wikibase:limit 1 .
    ?item_ wikibase:apiOutputItem mwapi:item.
  }
  FILTER(?item = ?item_)
}
LIMIT 50

Conveniently, it has a minimum size parameter built in, so we don’t even
need to get the size as a revision property and filter on it afterwards.

However, this requires one API call per item, so it doesn’t scale at all
– this query with just 50 arbitrary author items already takes about
half a minute. (The commented-out ORDER BY DESC(?sitelinks) is intended
as a heuristic to find larger articles first, but all the top 50 authors
by sitelinks have articles longer than 10000 bytes on enwiki, so in that
case you might as well just skip the MWAPI part altogether.)

I don’t think this can work very well. Even if MWAPI was expanded so
that we could directly feed 50 or even 500 titles to the query API (as
the titles parameter, skipping generators altogether), that’s probably
still too much of a bottleneck for this kind of query.

On 12.01.19 15:00, Ettore RIZZA wrote:
> Hi,
>
> Since the Mediawiki API allows to get the size in bytes of the last
> revision
> <https://en.wikipedia.org/w/api.php?action=query&format=json&titles=barack%20obama&prop=revisions&rvprop=size>
> of a Wikipedia page, is it not possible to retrieve this information
> with a generator? (it's a real question, I'm not at all comfortable
> with this API). 
>
> Ettore Rizza
>
>
> Le sam. 12 janv. 2019 à 14:41, Reem Al-Kashif <reemalkas...@gmail.com
> <mailto:reemalkas...@gmail.com>> a écrit :
>
>     Right, I see what you mean. Thanks a lot!
>
>     On Sat, 12 Jan 2019 at 15:35, Lucas Werkmeister
>     <m...@lucaswerkmeister.de <mailto:m...@lucaswerkmeister.de>> wrote:
>
>         Well, if you take just the MWAPI part of the query
>         
> <https://query.wikidata.org/#SELECT%20%3Ftitle%20WHERE%20%7B%0A%20%20SERVICE%20wikibase%3Amwapi%20%7B%0A%20%20%20%20bd%3AserviceParam%20wikibase%3Aendpoint%20%22en.wikipedia.org%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20wikibase%3Aapi%20%22Generator%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agenerator%20%22querypage%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agqppage%20%22Longpages%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agqplimit%20%22max%22.%0A%20%20%20%20%3Ftitle%20wikibase%3AapiOutput%20mwapi%3Atitle.%0A%20%20%7D%0A%7D>,
>         you’ll get exactly 10000 results, but most of them aren’t male
>         authors (a lot of them seem to be lists of various kinds). And
>         I think those 10000 results are all we can get from the API,
>         so if we limit those to male authors afterwards, we only get a
>         few results (about 100), and there’s no way to increase that
>         number as far as I’m aware, because apparently we can’t get
>         more than 10000 total pages from MWAPI.
>
>         Cheers,
>         Lucas
>
>         On 12.01.19 13:57, Reem Al-Kashif wrote:
>>         Thank you so much, Nicolas & Lucas! 
>>
>>         @Lucas this helps a lot! At least I will get an idea about
>>         what I need until PetScan is sorted out. Would you elaborate
>>         a bit more what do you mean by "most of its results are
>>         linked to items we don’t care about"?
>>
>>         Best,
>>         Reem
>>
>>         On Sat, 12 Jan 2019 at 14:18, Lucas Werkmeister
>>         <m...@lucaswerkmeister.de <mailto:m...@lucaswerkmeister.de>>
>>         wrote:
>>
>>             You can’t directly query for the size as far as I know,
>>             but you can use the longpages query page generator to get
>>             a list of the longest enwiki pages, then filter the
>>             associated items for male authors. But this will only get
>>             you about a hundred results until the longpages list is
>>             exhausted (most of its results are linked to items we
>>             don’t care about), and it won’t get you the actual size
>>             (and therefore the order of results isn’t necessarily
>>             meaningful either, you just know they’re among the
>>             longest pages).
>>
>>             SELECT ?item ?titleEn WHERE {
>>               hint:Query hint:optimizer "None".
>>               SERVICE wikibase:mwapi {
>>                 bd:serviceParam wikibase:endpoint "en.wikipedia.org
>>             <http://en.wikipedia.org>";
>>                                 wikibase:api "Generator";
>>                                 mwapi:generator "querypage";
>>                                 mwapi:gqppage "Longpages";
>>                                 mwapi:gqplimit "max".
>>                 ?title wikibase:apiOutput mwapi:title.
>>               }
>>               BIND(STRLANG(?title, "en") AS ?titleEn)
>>               ?sitelink schema:name ?titleEn;
>>                         schema:isPartOf <https://en.wikipedia.org/>
>>             <https://en.wikipedia.org/>;
>>                         schema:about ?item.
>>               ?item wdt:P31 wd:Q5;
>>                     wdt:P106 wd:Q36180;
>>                     wdt:P21 wd:Q6581097.
>>             }
>>
>>             Try it!
>>
>>             Cheers, Lucas
>>
>>             On 12.01.19 12:56, Nicolas VIGNERON wrote:
>>>             Hi Reem,
>>>
>>>             If this page
>>>             
>>> https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI
>>>             is up-o-date it's does not seem possible to get the
>>>             article size of a wikipedia article (but I must I don't
>>>             use and know "|wikibase:mwapi|" a lot, maybe it has or
>>>             will changed).
>>>
>>>             Cheers,
>>>             Nicolas
>>>
>>>             Le sam. 12 janv. 2019 à 12:16, Reem Al-Kashif
>>>             <reemalkas...@gmail.com <mailto:reemalkas...@gmail.com>>
>>>             a écrit :
>>>
>>>                 Hello!
>>>
>>>                 Hope this finds you well. I put together a query
>>>                 
>>> <https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3FsitelinkEn%0A%0AWHERE%20%7B%0A%20%3Fitem%20wdt%3AP31%20wd%3AQ5.%0A%20%3Fitem%20wdt%3AP106%20wd%3AQ36180.%0A%20%3Fitem%20wdt%3AP21%20wd%3AQ6581097.%0A%20%3FsitelinkEn%20schema%3Aabout%20%3Fitem%3B%0A%20%20%09%09%09%20%20%20%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fen.wikipedia.org%2F%3E.%0A%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%20%20%7D>
>>>                 to create a list of English Wikipedia articles about
>>>                 male writers. Is it possible to filter the results
>>>                 by size? For example, articles that are larger than
>>>                 or equal to 10k bytes?
>>>
>>>                 I understand that this is better done by PetScan,
>>>                 but my PetScan query
>>>                 
>>> <https://petscan.wmflabs.org/?language=en&project=wikipedia&depth=50&categories=Male%20writers&ns%5B0%5D=1&larger=10000&search_max_results=500&interface_language=en&&doit=>
>>>                 refuses to cooperate for a reason I don't know yet.. :/
>>>
>>>                 Thanks in advance.
>>>
>>>                 Best,
>>>                 Reem
>>>
>>>                 -- 
>>>                 *Kind regards,
>>>                 Reem Al-Kashif*
>>>
>>>                 
>>> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
>>>                     Virus-free. www.avg.com
>>>                 
>>> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
>>>
>>>
>>>                 _______________________________________________
>>>                 Wikidata mailing list
>>>                 Wikidata@lists.wikimedia.org
>>>                 <mailto:Wikidata@lists.wikimedia.org>
>>>                 https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>>             _______________________________________________
>>>             Wikidata mailing list
>>>             Wikidata@lists.wikimedia.org 
>>> <mailto:Wikidata@lists.wikimedia.org>
>>>             https://lists.wikimedia.org/mailman/listinfo/wikidata
>>             _______________________________________________
>>             Wikidata mailing list
>>             Wikidata@lists.wikimedia.org
>>             <mailto:Wikidata@lists.wikimedia.org>
>>             https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>>
>>         -- 
>>         *Kind regards,
>>         Reem Al-Kashif*
>>
>>         _______________________________________________
>>         Wikidata mailing list
>>         Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
>>         https://lists.wikimedia.org/mailman/listinfo/wikidata
>         _______________________________________________
>         Wikidata mailing list
>         Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
>         https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
>     -- 
>     *Kind regards,
>     Reem Al-Kashif*
>     _______________________________________________
>     Wikidata mailing list
>     Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
>     https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to