Kiril, I wrote something a while back in java that was able to get the number of contributions per user for a given language in Wikipedia. It could be able to be altered for your purposes if the datastructure of the namespaces is the same or similar.
https://github.com/hachacha/wikiParticipants particularly this file https://github.com/hachacha/wikiParticipants/blob/master/src/wikipediansbynumberofedits_en/WikipediansByNumberOfEdits.java#L266 Altering which contributions would be saved within a specific date range is possible. God Bless, Jonathan On Fri, Jun 7, 2019 at 8:00 AM <wiki-research-l-requ...@lists.wikimedia.org> wrote: > Send Wiki-research-l mailing list submissions to > wiki-research-l@lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > or, via email, send a message with subject or body 'help' to > wiki-research-l-requ...@lists.wikimedia.org > > You can reach the person managing the list at > wiki-research-l-ow...@lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Wiki-research-l digest..." > > > Today's Topics: > > 1. Fwd: [Wikidata] Scaling Wikidata Query Service (Pine W) > 2. Database of all users (Kiril Simeonovski) > 3. Re: Database of all users (Federico Leva (Nemo)) > 4. Re: Database of all users (Kiril Simeonovski) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 6 Jun 2019 19:35:13 +0000 > From: Pine W <wiki.p...@gmail.com> > To: "wikitec...@lists.wikimedia.org" <wikitec...@lists.wikimedia.org>, > Wiki Research-l <wiki-research-l@lists.wikimedia.org> > Subject: [Wiki-research-l] Fwd: [Wikidata] Scaling Wikidata Query > Service > Message-ID: > <CAF=dyJiJFXf7Jp8NUUu90Zd2dBT6J= > fhtyjirawrhn+uv2j...@mail.gmail.com> > Content-Type: text/plain; charset="UTF-8" > > Forwarding in case this is of interest. > > Pine > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > ---------- Forwarded message --------- > From: Guillaume Lederrey <gleder...@wikimedia.org> > Date: Thu, Jun 6, 2019 at 7:33 PM > Subject: [Wikidata] Scaling Wikidata Query Service > To: Discussion list for the Wikidata project. < > wikid...@lists.wikimedia.org> > > > Hello all! > > There has been a number of concerns raised about the performance and > scaling of Wikdata Query Service. We share those concerns and we are > doing our best to address them. Here is some info about what is going > on: > > In an ideal world, WDQS should: > > * scale in terms of data size > * scale in terms of number of edits > * have low update latency > * expose a SPARQL endpoint for queries > * allow anyone to run any queries on the public WDQS endpoint > * provide great query performance > * provide a high level of availability > > Scaling graph databases is a "known hard problem", and we are reaching > a scale where there are no obvious easy solutions to address all the > above constraints. At this point, just "throwing hardware at the > problem" is not an option anymore. We need to go deeper into the > details and potentially make major changes to the current architecture. > Some scaling considerations are discussed in [1]. This is going to take > time. > > Reasonably, addressing all of the above constraints is unlikely to > ever happen. Some of the constraints are non negotiable: if we can't > keep up with Wikidata in term of data size or number of edits, it does > not make sense to address query performance. On some constraints, we > will probably need to compromise. > > For example, the update process is asynchronous. It is by nature > expected to lag. In the best case, this lag is measured in minutes, > but can climb to hours occasionally. This is a case of prioritizing > stability and correctness (ingesting all edits) over update latency. > And while we can work to reduce the maximum latency, this will still > be an asynchronous process and needs to be considered as such. > > We currently have one Blazegraph expert working with us to address a > number of performance and stability issues. We > are planning to hire an additional engineer to help us support the > service in the long term. You can follow our current work in phabricator > [2]. > > If anyone has experience with scaling large graph databases, please > reach out to us, we're always happy to share ideas! > > Thanks all for your patience! > > Guillaume > > [1] > https://wikitech.wikimedia.org/wiki/Wikidata_query_service/ScalingStrategy > [2] https://phabricator.wikimedia.org/project/view/1239/ > > -- > Guillaume Lederrey > Engineering Manager, Search Platform > Wikimedia Foundation > UTC+2 / CEST > > _______________________________________________ > Wikidata mailing list > wikid...@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > > > ------------------------------ > > Message: 2 > Date: Fri, 7 Jun 2019 08:57:38 +0200 > From: Kiril Simeonovski <kiril.simeonov...@gmail.com> > To: Research into Wikimedia content and communities > <wiki-research-l@lists.wikimedia.org> > Subject: [Wiki-research-l] Database of all users > Message-ID: > < > cabuehm5mfdeo7sjrpmw_ak-mpd2qh0c2jygnou9odb5ytut...@mail.gmail.com> > Content-Type: text/plain; charset="UTF-8" > > Dear all, > > I was wondering if there is a way to extract a database of all users (or > selection of users according to some criteria) with their contributions to > the Wikimedia projects until a fixed point of time from the XTools. > > Thank you. > > Best regards, > Kiril > > > ------------------------------ > > Message: 3 > Date: Fri, 7 Jun 2019 10:53:30 +0300 > From: "Federico Leva (Nemo)" <nemow...@gmail.com> > To: Research into Wikimedia content and communities > <wiki-research-l@lists.wikimedia.org>, Kiril Simeonovski > <kiril.simeonov...@gmail.com> > Subject: Re: [Wiki-research-l] Database of all users > Message-ID: <33f8a998-2144-1d49-5347-8c59018e2...@gmail.com> > Content-Type: text/plain; charset=utf-8; format=flowed > > Kiril Simeonovski, 07/06/19 09:57: > > with their contributions to > > the Wikimedia projects > > Do you mean the *number* of their contributions, or literally all their > contributions? Filtering the stub dumps would be one systematic way to > get all the metadata about edits. > > If you just need aggregate numbers with some filter by date, namespace > or other, the fastest way is probably to write a script which loops > through all the databases on Labs. For instance I made this to list the > users who contribute in a certain language, to find translators for very > small languages: > < > https://gerrit.wikimedia.org/r/plugins/gitiles/labs/tools/lists/+/master/scripts/userslang.py > > > > Federico > > > > ------------------------------ > > Message: 4 > Date: Fri, 7 Jun 2019 09:57:45 +0200 > From: Kiril Simeonovski <kiril.simeonov...@gmail.com> > To: "Federico Leva (Nemo)" <nemow...@gmail.com> > Cc: Research into Wikimedia content and communities > <wiki-research-l@lists.wikimedia.org> > Subject: Re: [Wiki-research-l] Database of all users > Message-ID: > <CABuEHm7ahWx9P= > xa_km1s+q3z0wkohaxcounfx3asa_cfnv...@mail.gmail.com> > Content-Type: text/plain; charset="UTF-8" > > Hi Federico, > > Thanks for the straightforward answer. My idea is to extract the number of > contributions across projects and namespaces. > > Best, > Kiril > > On Fri, Jun 7, 2019 at 9:53 AM Federico Leva (Nemo) <nemow...@gmail.com> > wrote: > > > Kiril Simeonovski, 07/06/19 09:57: > > > with their contributions to > > > the Wikimedia projects > > > > Do you mean the *number* of their contributions, or literally all their > > contributions? Filtering the stub dumps would be one systematic way to > > get all the metadata about edits. > > > > If you just need aggregate numbers with some filter by date, namespace > > or other, the fastest way is probably to write a script which loops > > through all the databases on Labs. For instance I made this to list the > > users who contribute in a certain language, to find translators for very > > small languages: > > < > > > https://gerrit.wikimedia.org/r/plugins/gitiles/labs/tools/lists/+/master/scripts/userslang.py > > > > > > > Federico > > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > ------------------------------ > > End of Wiki-research-l Digest, Vol 166, Issue 4 > *********************************************** > _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l