Re: [Wikidata-l] How can I increase the throughput of ProteinBoxBot?

Katie Filbert Wed, 01 Oct 2014 03:12:00 -0700

On Wed, Oct 1, 2014 at 10:56 AM, Andra Waagmeester <an...@micelio.be> wrote:


> There are about 47000 genes. In the first step the bot check if an entry
> already exists and, if not a new entry is made subsequently three claims
> are added (Entrez Gene ID (P351)
> <https://www.wikidata.org/wiki/Property:P351#top> found in taxon (P703)
> <https://www.wikidata.org/wiki/Property:P703#top>, subclass of (P279)) as
> well as synonyms. Currently this process takes a week to complete. In a
> second phase identifiers for each gene are obtained and added as respective
> claims. Typically this ranges from 1 claim per property to up to 20 claims
> per property (This is a rough estimate). This bot has been running for 2
> weeks and is currently at 74% of all genes covered.
>
>
> Currently each entity creation and subsequent claims are unique API calls.
> So using the wbeditentity will probably result in an improvement. Thanks
> for the suggestion.
>

When running the bot, please keep in mind the change dispatch lag, and
ensure it doesn't get too high:

https://www.wikidata.org/wiki/Special:DispatchStats

which can be accessed via api:

https://www.wikidata.org/w/api.php?action=query&meta=siteinfo&siprop=statistics&format=json

Cheers,
Katie




>
> Cheers, Andra
>
>
>
>
> On Tue, Sep 30, 2014 at 9:05 PM, Daniel Kinzler <
> daniel.kinz...@wikimedia.de> wrote:
>
>> What makes it so slow?
>>
>> Note that you can use wbeditentity to perform complex edits with a single
>> api
>> call. It's not as streight forward to use as, say, wbaddclaim, but much
>> more
>> powerfull and efficient.
>>
>> -- daniel
>>
>> Am 30.09.2014 19:00, schrieb Andra Waagmeester:
>> > Hi All,
>> >
>> >       I have joined the development team of the ProteinBoxBot
>> > (https://www.wikidata.org/wiki/User:ProteinBoxBot) . Our goal is to
>> make
>> > Wikidata the canonical resource for referencing and translating
>> identifiers for
>> > genes and proteins from different species.
>> >
>> > Currently adding all genes from the human genome and their related
>> identifiers
>> > to Wikidata takes more then a month to complete. With the objective to
>> add other
>> > species, as well as having frequent updates for each of the genomes, it
>> would be
>> > convenient if we could increase this throughput.
>> >
>> > Would it be accepted if we increase the throughput by running multiple
>> instances
>> > of ProteinBoxBot in parallel. If so, what would be an accepted number of
>> > parallel instances of a bot to run? We can run multiple instances from
>> different
>> > geographical locations if necessary.
>> >
>> > Kind regards,
>> >
>> >
>> > Andra
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Wikidata-l mailing list
>> > Wikidata-l@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>> >
>>
>>
>> --
>> Daniel Kinzler
>> Senior Software Developer
>>
>> Wikimedia Deutschland
>> Gesellschaft zur Förderung Freien Wissens e.V.
>>
>> _______________________________________________
>> Wikidata-l mailing list
>> Wikidata-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>
>
>
> _______________________________________________
> Wikidata-l mailing list
> Wikidata-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
>


-- 
Katie Filbert
Wikidata Developer

Wikimedia Germany e.V. | Tempelhofer Ufer 23-24, 10963 Berlin
Phone (030) 219 158 26-0

http://wikimedia.de

Wikimedia Germany - Society for the Promotion of free knowledge eV Entered
in the register of Amtsgericht Berlin-Charlottenburg under the number 23
855 as recognized as charitable by the Inland Revenue for corporations I
Berlin, tax number 27/681/51985.

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Re: [Wikidata-l] How can I increase the throughput of ProteinBoxBot?

Reply via email to