On Wed, Oct 1, 2014 at 10:56 AM, Andra Waagmeester <an...@micelio.be> wrote:
> There are about 47000 genes. In the first step the bot check if an entry > already exists and, if not a new entry is made subsequently three claims > are added (Entrez Gene ID (P351) > <https://www.wikidata.org/wiki/Property:P351#top> found in taxon (P703) > <https://www.wikidata.org/wiki/Property:P703#top>, subclass of (P279)) as > well as synonyms. Currently this process takes a week to complete. In a > second phase identifiers for each gene are obtained and added as respective > claims. Typically this ranges from 1 claim per property to up to 20 claims > per property (This is a rough estimate). This bot has been running for 2 > weeks and is currently at 74% of all genes covered. > > > Currently each entity creation and subsequent claims are unique API calls. > So using the wbeditentity will probably result in an improvement. Thanks > for the suggestion. > When running the bot, please keep in mind the change dispatch lag, and ensure it doesn't get too high: https://www.wikidata.org/wiki/Special:DispatchStats which can be accessed via api: https://www.wikidata.org/w/api.php?action=query&meta=siteinfo&siprop=statistics&format=json Cheers, Katie > > Cheers, Andra > > > > > On Tue, Sep 30, 2014 at 9:05 PM, Daniel Kinzler < > daniel.kinz...@wikimedia.de> wrote: > >> What makes it so slow? >> >> Note that you can use wbeditentity to perform complex edits with a single >> api >> call. It's not as streight forward to use as, say, wbaddclaim, but much >> more >> powerfull and efficient. >> >> -- daniel >> >> Am 30.09.2014 19:00, schrieb Andra Waagmeester: >> > Hi All, >> > >> > I have joined the development team of the ProteinBoxBot >> > (https://www.wikidata.org/wiki/User:ProteinBoxBot) . Our goal is to >> make >> > Wikidata the canonical resource for referencing and translating >> identifiers for >> > genes and proteins from different species. >> > >> > Currently adding all genes from the human genome and their related >> identifiers >> > to Wikidata takes more then a month to complete. With the objective to >> add other >> > species, as well as having frequent updates for each of the genomes, it >> would be >> > convenient if we could increase this throughput. >> > >> > Would it be accepted if we increase the throughput by running multiple >> instances >> > of ProteinBoxBot in parallel. If so, what would be an accepted number of >> > parallel instances of a bot to run? We can run multiple instances from >> different >> > geographical locations if necessary. >> > >> > Kind regards, >> > >> > >> > Andra >> > >> > >> > >> > >> > _______________________________________________ >> > Wikidata-l mailing list >> > Wikidata-l@lists.wikimedia.org >> > https://lists.wikimedia.org/mailman/listinfo/wikidata-l >> > >> >> >> -- >> Daniel Kinzler >> Senior Software Developer >> >> Wikimedia Deutschland >> Gesellschaft zur Förderung Freien Wissens e.V. >> >> _______________________________________________ >> Wikidata-l mailing list >> Wikidata-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikidata-l >> > > > _______________________________________________ > Wikidata-l mailing list > Wikidata-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-l > > -- Katie Filbert Wikidata Developer Wikimedia Germany e.V. | Tempelhofer Ufer 23-24, 10963 Berlin Phone (030) 219 158 26-0 http://wikimedia.de Wikimedia Germany - Society for the Promotion of free knowledge eV Entered in the register of Amtsgericht Berlin-Charlottenburg under the number 23 855 as recognized as charitable by the Inland Revenue for corporations I Berlin, tax number 27/681/51985.
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l