I round up from DOI/PubMed ID counts on https://tools.wmflabs.org/scholia/
Egon On Sat, Dec 15, 2018 at 3:03 PM Fabrizio Carrai <fabrizio.car...@gmail.com> wrote: > Excellent, I did some tests and with some cycles I already identified and > classified several articles. > I will have a look at your script in the next days but I already have a > question: the number of iterations is based on the total number of > articles, how do you know that ? > > --- > Fabrizio > > Il giorno sab 15 dic 2018 alle ore 10:18 Egon Willighagen < > egon.willigha...@gmail.com> ha scritto: > >> >> The approach I use is the following, see this (Bioclipse/Groovy) script: >> https://gist.github.com/egonw/ca4c348b9a2d1116efcdb55fa85dd158 >> >> It takes advantage of a combination Blazegraph SPARQL trick and breaking >> up thing in batches of a certain size: >> >> SELECT ?art ?artLabel >> WITH { >> SELECT ?art WHERE { >> ?art wdt:P31 wd:Q13442814 >> } LIMIT $batchSize OFFSET $offset >> } AS %RESULTS { >> INCLUDE %RESULTS >> ?art wdt:P1476 ?artLabel . >> MINUS { ?art wdt:P921 wd:$conceptQ } >> FILTER (contains(lcase(str(?artLabel)), "$concept")) >> } >> where "$concept" is my search word in the title, and $batchSize and >> $offset take care of the batching by the script. This script creates >> QuickStatements. >> >> Mind you, I manually check the created statements, because in my domain >> (biochem) a simple search results of false positives, hence the "blacklist" >> in the script :) >> >> Egon >> >> >> >> >> >> >> >> >> >> >> On Sat, Dec 15, 2018 at 10:13 AM Fabrizio Carrai < >> fabrizio.car...@gmail.com> wrote: >> >>> Thanks Matthias, >>> that's a pity. Your suggestion relies on the effective characterization >>> of the item that, at this writing time, is pretty poor for my interest. >>> Could it be an idea to download all the "scholary articles", locally >>> select for the keyword of interest (e.g. "microgravity") and set the >>> property P921 for all of them ? Quickstatements may be helpful for the last >>> step, any suggestions for other tools ? >>> >>> Thanks >>> Fabrizio >>> >>> Il giorno ven 14 dic 2018 alle ore 22:16 Matthias Erfurth < >>> erfu...@gmx.de> ha scritto: >>> >>>> Hi Fabrizio, >>>> unfortunately you can't fulltext search all the scholarly articles >>>> <https://www.wikidata.org/wiki/Q13442814> , you should better work >>>> with indexed properties, so >>>> you can query for other articles with microgravity as main subject ... >>>> With the ajax based wikidata search >>>> >>>> SELECT ?item >>>> WHERE { >>>> ?item wdt:P31 wd:Q13442814; >>>> wdt:P921 wd:Q48655. >>>> } >>>> >>>> Best regards, >>>> >>>> ciao matthias >>>> >>>> >>>> *Gesendet:* Freitag, 14. Dezember 2018 um 18:55 Uhr >>>> *Von:* "Fabrizio Carrai" <fabrizio.car...@gmail.com> >>>> *An:* "Discussion list for the Wikidata project" < >>>> wikidata@lists.wikimedia.org> >>>> *Betreff:* Re: [Wikidata] Query on scholarly article fails >>>> Thanks again to Ettore, but I immediately found another timeout problem >>>> when I just added a FILTER to find all the articles with the word "biokis" >>>> in the title >>>> >>>> SELECT ?istanza_di ?instanza_diLabel WHERE { >>>> ?istanza_di wdt:P31 wd:Q13442814. >>>> ?istanza_di rdfs:label ?instanza_diLabel. >>>> FILTER((LANG(?instanza_diLabel)) = "en"). >>>> FILTER(CONTAINS(LCASE(?instanza_diLabel), "biokis")) >>>> } >>>> LIMIT 100 >>>> >>>> At least one article should be returned: >>>> https://www.wikidata.org/wiki/Q57202937 >>>> but I got a timeout. >>>> >>>> Thanks to anybody that can help >>>> >>>> Fabrizio >>>> >>>> >>>> Il giorno ven 14 dic 2018 alle ore 10:12 Ettore RIZZA < >>>> ettoreri...@gmail.com> ha scritto: >>>> >>>>> Hello Fabrizio, >>>>> >>>>> It seems that the problem comes from SERVICE wikibase:label. As said >>>>> in another discussion, the query executes in less than one second if you >>>>> rewrite >>>>> it in this way >>>>> <https://query.wikidata.org/#SELECT%20%3Fistanza_di%20%3Finstanza_diLabel%20WHERE%20%7B%0A%20%20%3Fistanza_di%20wdt%3AP31%20wd%3AQ13442814.%0A%20%20%3Fistanza_di%20rdfs%3Alabel%20%3Finstanza_diLabel.%0A%20%20FILTER%28%28LANG%28%3Finstanza_diLabel%29%29%20%3D%20%22en%22%29%0A%7D%0ALIMIT%2010> >>>>> . >>>>> >>>>> Cheers, >>>>> >>>>> Ettore Rizza >>>>> >>>>> Le ven. 14 déc. 2018 à 09:59, Fabrizio Carrai < >>>>> fabrizio.car...@gmail.com> a écrit : >>>>> >>>>>> Hello all, >>>>>> the following query ends with a timeot: >>>>>> >>>>>> SELECT ?istanza_di ?istanza_diLabel WHERE { >>>>>> SERVICE wikibase:label { bd:serviceParam wikibase:language >>>>>> "[AUTO_LANGUAGE],en". } >>>>>> ?istanza_di wdt:P31 wd:Q13442814. >>>>>> } >>>>>> LIMIT 10 >>>>>> >>>>>> Can anybody explain why ? >>>>>> Thanks in advance >>>>>> >>>>>> -- >>>>>> *Fabrizio* >>>>>> _______________________________________________ >>>>>> Wikidata mailing list >>>>>> Wikidata@lists.wikimedia.org >>>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>>>> >>>>> _______________________________________________ >>>>> Wikidata mailing list >>>>> Wikidata@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>>> >>>> >>>> >>>> -- >>>> *Fabrizio* >>>> _______________________________________________ Wikidata mailing list >>>> Wikidata@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>>> _______________________________________________ >>>> Wikidata mailing list >>>> Wikidata@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>>> >>> >>> >>> -- >>> *Fabrizio* >>> _______________________________________________ >>> Wikidata mailing list >>> Wikidata@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>> >> >> >> -- >> Hi, do you like citation networks? Already 51% of all citations are >> available <https://i4oc.org/> available for innovative new uses >> <https://twitter.com/hashtag/acs2ioc>. Join my in asking the American >> Chemical Society to join the Initiative for Open Citations too >> <https://www.change.org/p/asking-the-american-chemical-society-to-join-the-initiative-for-open-citations>. >> SpringerNature, >> the RSC and many others already did <https://i4oc.org/#publishers>. >> >> ----- >> E.L. Willighagen >> Department of Bioinformatics - BiGCaT >> Maastricht University (http://www.bigcat.unimaas.nl/) >> Homepage: http://egonw.github.com/ >> Blog: http://chem-bla-ics.blogspot.com/ >> PubList: https://www.zotero.org/egonw >> ORCID: 0000-0001-7542-0286 <http://orcid.org/0000-0001-7542-0286> >> ImpactStory: https://impactstory.org/u/egonwillighagen >> _______________________________________________ >> Wikidata mailing list >> Wikidata@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikidata >> > > > -- > *Fabrizio* > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > -- Hi, do you like citation networks? Already 51% of all citations are available <https://i4oc.org/> available for innovative new uses <https://twitter.com/hashtag/acs2ioc>. Join my in asking the American Chemical Society to join the Initiative for Open Citations too <https://www.change.org/p/asking-the-american-chemical-society-to-join-the-initiative-for-open-citations>. SpringerNature, the RSC and many others already did <https://i4oc.org/#publishers>. ----- E.L. Willighagen Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: https://www.zotero.org/egonw ORCID: 0000-0001-7542-0286 <http://orcid.org/0000-0001-7542-0286> ImpactStory: https://impactstory.org/u/egonwillighagen
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata