Excellent, I did some tests and with some cycles I already identified and classified several articles. I will have a look at your script in the next days but I already have a question: the number of iterations is based on the total number of articles, how do you know that ?
--- Fabrizio Il giorno sab 15 dic 2018 alle ore 10:18 Egon Willighagen < egon.willigha...@gmail.com> ha scritto: > > The approach I use is the following, see this (Bioclipse/Groovy) script: > https://gist.github.com/egonw/ca4c348b9a2d1116efcdb55fa85dd158 > > It takes advantage of a combination Blazegraph SPARQL trick and breaking > up thing in batches of a certain size: > > SELECT ?art ?artLabel > WITH { > SELECT ?art WHERE { > ?art wdt:P31 wd:Q13442814 > } LIMIT $batchSize OFFSET $offset > } AS %RESULTS { > INCLUDE %RESULTS > ?art wdt:P1476 ?artLabel . > MINUS { ?art wdt:P921 wd:$conceptQ } > FILTER (contains(lcase(str(?artLabel)), "$concept")) > } > where "$concept" is my search word in the title, and $batchSize and > $offset take care of the batching by the script. This script creates > QuickStatements. > > Mind you, I manually check the created statements, because in my domain > (biochem) a simple search results of false positives, hence the "blacklist" > in the script :) > > Egon > > > > > > > > > > > On Sat, Dec 15, 2018 at 10:13 AM Fabrizio Carrai < > fabrizio.car...@gmail.com> wrote: > >> Thanks Matthias, >> that's a pity. Your suggestion relies on the effective characterization >> of the item that, at this writing time, is pretty poor for my interest. >> Could it be an idea to download all the "scholary articles", locally >> select for the keyword of interest (e.g. "microgravity") and set the >> property P921 for all of them ? Quickstatements may be helpful for the last >> step, any suggestions for other tools ? >> >> Thanks >> Fabrizio >> >> Il giorno ven 14 dic 2018 alle ore 22:16 Matthias Erfurth <erfu...@gmx.de> >> ha scritto: >> >>> Hi Fabrizio, >>> unfortunately you can't fulltext search all the scholarly articles >>> <https://www.wikidata.org/wiki/Q13442814> , you should better work with >>> indexed properties, so >>> you can query for other articles with microgravity as main subject ... >>> With the ajax based wikidata search >>> >>> SELECT ?item >>> WHERE { >>> ?item wdt:P31 wd:Q13442814; >>> wdt:P921 wd:Q48655. >>> } >>> >>> Best regards, >>> >>> ciao matthias >>> >>> >>> *Gesendet:* Freitag, 14. Dezember 2018 um 18:55 Uhr >>> *Von:* "Fabrizio Carrai" <fabrizio.car...@gmail.com> >>> *An:* "Discussion list for the Wikidata project" < >>> wikidata@lists.wikimedia.org> >>> *Betreff:* Re: [Wikidata] Query on scholarly article fails >>> Thanks again to Ettore, but I immediately found another timeout problem >>> when I just added a FILTER to find all the articles with the word "biokis" >>> in the title >>> >>> SELECT ?istanza_di ?instanza_diLabel WHERE { >>> ?istanza_di wdt:P31 wd:Q13442814. >>> ?istanza_di rdfs:label ?instanza_diLabel. >>> FILTER((LANG(?instanza_diLabel)) = "en"). >>> FILTER(CONTAINS(LCASE(?instanza_diLabel), "biokis")) >>> } >>> LIMIT 100 >>> >>> At least one article should be returned: >>> https://www.wikidata.org/wiki/Q57202937 >>> but I got a timeout. >>> >>> Thanks to anybody that can help >>> >>> Fabrizio >>> >>> >>> Il giorno ven 14 dic 2018 alle ore 10:12 Ettore RIZZA < >>> ettoreri...@gmail.com> ha scritto: >>> >>>> Hello Fabrizio, >>>> >>>> It seems that the problem comes from SERVICE wikibase:label. As said in >>>> another discussion, the query executes in less than one second if you >>>> rewrite >>>> it in this way >>>> <https://query.wikidata.org/#SELECT%20%3Fistanza_di%20%3Finstanza_diLabel%20WHERE%20%7B%0A%20%20%3Fistanza_di%20wdt%3AP31%20wd%3AQ13442814.%0A%20%20%3Fistanza_di%20rdfs%3Alabel%20%3Finstanza_diLabel.%0A%20%20FILTER%28%28LANG%28%3Finstanza_diLabel%29%29%20%3D%20%22en%22%29%0A%7D%0ALIMIT%2010> >>>> . >>>> >>>> Cheers, >>>> >>>> Ettore Rizza >>>> >>>> Le ven. 14 déc. 2018 à 09:59, Fabrizio Carrai < >>>> fabrizio.car...@gmail.com> a écrit : >>>> >>>>> Hello all, >>>>> the following query ends with a timeot: >>>>> >>>>> SELECT ?istanza_di ?istanza_diLabel WHERE { >>>>> SERVICE wikibase:label { bd:serviceParam wikibase:language >>>>> "[AUTO_LANGUAGE],en". } >>>>> ?istanza_di wdt:P31 wd:Q13442814. >>>>> } >>>>> LIMIT 10 >>>>> >>>>> Can anybody explain why ? >>>>> Thanks in advance >>>>> >>>>> -- >>>>> *Fabrizio* >>>>> _______________________________________________ >>>>> Wikidata mailing list >>>>> Wikidata@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>>> >>>> _______________________________________________ >>>> Wikidata mailing list >>>> Wikidata@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>> >>> >>> >>> -- >>> *Fabrizio* >>> _______________________________________________ Wikidata mailing list >>> Wikidata@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>> _______________________________________________ >>> Wikidata mailing list >>> Wikidata@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>> >> >> >> -- >> *Fabrizio* >> _______________________________________________ >> Wikidata mailing list >> Wikidata@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikidata >> > > > -- > Hi, do you like citation networks? Already 51% of all citations are > available <https://i4oc.org/> available for innovative new uses > <https://twitter.com/hashtag/acs2ioc>. Join my in asking the American > Chemical Society to join the Initiative for Open Citations too > <https://www.change.org/p/asking-the-american-chemical-society-to-join-the-initiative-for-open-citations>. > SpringerNature, > the RSC and many others already did <https://i4oc.org/#publishers>. > > ----- > E.L. Willighagen > Department of Bioinformatics - BiGCaT > Maastricht University (http://www.bigcat.unimaas.nl/) > Homepage: http://egonw.github.com/ > Blog: http://chem-bla-ics.blogspot.com/ > PubList: https://www.zotero.org/egonw > ORCID: 0000-0001-7542-0286 <http://orcid.org/0000-0001-7542-0286> > ImpactStory: https://impactstory.org/u/egonwillighagen > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > -- *Fabrizio*
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata