Excellent, I did some tests and with some cycles I already identified and
classified several articles.
I will have a look at your script in the  next days but I already have a
question: the number of iterations is based on the total number of
articles, how do you know that ?

---
Fabrizio

Il giorno sab 15 dic 2018 alle ore 10:18 Egon Willighagen <
egon.willigha...@gmail.com> ha scritto:

>
> The approach I use is the following, see this (Bioclipse/Groovy) script:
> https://gist.github.com/egonw/ca4c348b9a2d1116efcdb55fa85dd158
>
> It takes advantage of a combination Blazegraph SPARQL trick and breaking
> up thing in batches of a certain size:
>
> SELECT ?art ?artLabel
> WITH {
> SELECT ?art WHERE {
> ?art wdt:P31 wd:Q13442814
> } LIMIT $batchSize OFFSET $offset
> } AS %RESULTS {
> INCLUDE %RESULTS
> ?art wdt:P1476 ?artLabel .
> MINUS { ?art wdt:P921 wd:$conceptQ }
> FILTER (contains(lcase(str(?artLabel)), "$concept"))
> }
> where "$concept" is my search word in the title, and $batchSize and
> $offset take care of the batching by the script. This script creates
> QuickStatements.
>
> Mind you, I manually check the created statements, because in my domain
> (biochem) a simple search results of false positives, hence the "blacklist"
> in the script :)
>
> Egon
>
>
>
>
>
>
>
>
>
>
> On Sat, Dec 15, 2018 at 10:13 AM Fabrizio Carrai <
> fabrizio.car...@gmail.com> wrote:
>
>> Thanks Matthias,
>> that's a pity. Your suggestion relies on the effective characterization
>> of the item that,  at this writing time, is pretty poor for my interest.
>> Could it be an idea to download all the "scholary articles", locally
>> select  for the keyword of interest (e.g. "microgravity") and set the
>> property P921 for all of them ? Quickstatements may be helpful for the last
>> step, any suggestions for other tools ?
>>
>> Thanks
>> Fabrizio
>>
>> Il giorno ven 14 dic 2018 alle ore 22:16 Matthias Erfurth <erfu...@gmx.de>
>> ha scritto:
>>
>>> Hi Fabrizio,
>>> unfortunately you can't fulltext search all the scholarly articles
>>> <https://www.wikidata.org/wiki/Q13442814> , you should better work with
>>> indexed properties, so
>>> you can query for other articles with microgravity as main subject ...
>>> With the ajax based wikidata search
>>>
>>> SELECT ?item
>>> WHERE {
>>>     ?item wdt:P31 wd:Q13442814;
>>>           wdt:P921 wd:Q48655.
>>> }
>>>
>>> Best regards,
>>>
>>> ciao matthias
>>>
>>>
>>> *Gesendet:* Freitag, 14. Dezember 2018 um 18:55 Uhr
>>> *Von:* "Fabrizio Carrai" <fabrizio.car...@gmail.com>
>>> *An:* "Discussion list for the Wikidata project" <
>>> wikidata@lists.wikimedia.org>
>>> *Betreff:* Re: [Wikidata] Query on scholarly article fails
>>> Thanks again to Ettore, but I immediately found another timeout problem
>>> when I just added a FILTER to find all the articles with the word "biokis"
>>> in the title
>>>
>>> SELECT ?istanza_di ?instanza_diLabel WHERE {
>>>   ?istanza_di wdt:P31 wd:Q13442814.
>>>   ?istanza_di rdfs:label ?instanza_diLabel.
>>>   FILTER((LANG(?instanza_diLabel)) = "en").
>>>   FILTER(CONTAINS(LCASE(?instanza_diLabel), "biokis"))
>>> }
>>> LIMIT 100
>>>
>>> At least one article should be returned:
>>> https://www.wikidata.org/wiki/Q57202937
>>> but I got a timeout.
>>>
>>> Thanks to anybody that can help
>>>
>>> Fabrizio
>>>
>>>
>>> Il giorno ven 14 dic 2018 alle ore 10:12 Ettore RIZZA <
>>> ettoreri...@gmail.com> ha scritto:
>>>
>>>> Hello Fabrizio,
>>>>
>>>> It seems that the problem comes from SERVICE wikibase:label. As said in
>>>> another discussion, the query executes in less than one second if you 
>>>> rewrite
>>>> it in this way
>>>> <https://query.wikidata.org/#SELECT%20%3Fistanza_di%20%3Finstanza_diLabel%20WHERE%20%7B%0A%20%20%3Fistanza_di%20wdt%3AP31%20wd%3AQ13442814.%0A%20%20%3Fistanza_di%20rdfs%3Alabel%20%3Finstanza_diLabel.%0A%20%20FILTER%28%28LANG%28%3Finstanza_diLabel%29%29%20%3D%20%22en%22%29%0A%7D%0ALIMIT%2010>
>>>> .
>>>>
>>>> Cheers,
>>>>
>>>> Ettore Rizza
>>>>
>>>> Le ven. 14 déc. 2018 à 09:59, Fabrizio Carrai <
>>>> fabrizio.car...@gmail.com> a écrit :
>>>>
>>>>> Hello all,
>>>>> the following query ends with a timeot:
>>>>>
>>>>> SELECT ?istanza_di ?istanza_diLabel WHERE {
>>>>>   SERVICE wikibase:label { bd:serviceParam wikibase:language
>>>>> "[AUTO_LANGUAGE],en". }
>>>>>   ?istanza_di wdt:P31 wd:Q13442814.
>>>>> }
>>>>> LIMIT 10
>>>>>
>>>>> Can anybody explain why ?
>>>>> Thanks in advance
>>>>>
>>>>> --
>>>>> *Fabrizio*
>>>>> _______________________________________________
>>>>> Wikidata mailing list
>>>>> Wikidata@lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>
>>>> _______________________________________________
>>>> Wikidata mailing list
>>>> Wikidata@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>>
>>> --
>>> *Fabrizio*
>>> _______________________________________________ Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>> _______________________________________________
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>
>>
>> --
>> *Fabrizio*
>> _______________________________________________
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> --
> Hi, do you like citation networks? Already 51% of all citations are
> available <https://i4oc.org/> available for innovative new uses
> <https://twitter.com/hashtag/acs2ioc>. Join my in asking the American
> Chemical Society to join the Initiative for Open Citations too
> <https://www.change.org/p/asking-the-american-chemical-society-to-join-the-initiative-for-open-citations>.
>  SpringerNature,
> the RSC and many others already did <https://i4oc.org/#publishers>.
>
> -----
> E.L. Willighagen
> Department of Bioinformatics - BiGCaT
> Maastricht University (http://www.bigcat.unimaas.nl/)
> Homepage: http://egonw.github.com/
> Blog: http://chem-bla-ics.blogspot.com/
> PubList: https://www.zotero.org/egonw
> ORCID: 0000-0001-7542-0286 <http://orcid.org/0000-0001-7542-0286>
> ImpactStory: https://impactstory.org/u/egonwillighagen
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>


-- 
*Fabrizio*
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to