Kiril,

I wrote something a while back in java that was able to get the number of
contributions per user for a given language in Wikipedia. It could be able
to be altered for your purposes if the datastructure of the namespaces is
the same or similar.

https://github.com/hachacha/wikiParticipants
particularly this file
https://github.com/hachacha/wikiParticipants/blob/master/src/wikipediansbynumberofedits_en/WikipediansByNumberOfEdits.java#L266

Altering which contributions would be saved within a specific date range is
possible.

God Bless,

Jonathan

On Fri, Jun 7, 2019 at 8:00 AM <wiki-research-l-requ...@lists.wikimedia.org>
wrote:

> Send Wiki-research-l mailing list submissions to
>         wiki-research-l@lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> or, via email, send a message with subject or body 'help' to
>         wiki-research-l-requ...@lists.wikimedia.org
>
> You can reach the person managing the list at
>         wiki-research-l-ow...@lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wiki-research-l digest..."
>
>
> Today's Topics:
>
>    1. Fwd: [Wikidata] Scaling Wikidata Query Service (Pine W)
>    2. Database of all users (Kiril Simeonovski)
>    3. Re: Database of all users (Federico Leva (Nemo))
>    4. Re: Database of all users (Kiril Simeonovski)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 6 Jun 2019 19:35:13 +0000
> From: Pine W <wiki.p...@gmail.com>
> To: "wikitec...@lists.wikimedia.org" <wikitec...@lists.wikimedia.org>,
>         Wiki Research-l <wiki-research-l@lists.wikimedia.org>
> Subject: [Wiki-research-l] Fwd: [Wikidata] Scaling Wikidata Query
>         Service
> Message-ID:
>         <CAF=dyJiJFXf7Jp8NUUu90Zd2dBT6J=
> fhtyjirawrhn+uv2j...@mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Forwarding in case this is of interest.
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
>
>
> ---------- Forwarded message ---------
> From: Guillaume Lederrey <gleder...@wikimedia.org>
> Date: Thu, Jun 6, 2019 at 7:33 PM
> Subject: [Wikidata] Scaling Wikidata Query Service
> To: Discussion list for the Wikidata project. <
> wikid...@lists.wikimedia.org>
>
>
> Hello all!
>
> There has been a number of concerns raised about the performance and
> scaling of Wikdata Query Service. We share those concerns and we are
> doing our best to address them. Here is some info about what is going
> on:
>
> In an ideal world, WDQS should:
>
> * scale in terms of data size
> * scale in terms of number of edits
> * have low update latency
> * expose a SPARQL endpoint for queries
> * allow anyone to run any queries on the public WDQS endpoint
> * provide great query performance
> * provide a high level of availability
>
> Scaling graph databases is a "known hard problem", and we are reaching
> a scale where there are no obvious easy solutions to address all the
> above constraints. At this point, just "throwing hardware at the
> problem" is not an option anymore. We need to go deeper into the
> details and potentially make major changes to the current architecture.
> Some scaling considerations are discussed in [1]. This is going to take
> time.
>
> Reasonably, addressing all of the above constraints is unlikely to
> ever happen. Some of the constraints are non negotiable: if we can't
> keep up with Wikidata in term of data size or number of edits, it does
> not make sense to address query performance. On some constraints, we
> will probably need to compromise.
>
> For example, the update process is asynchronous. It is by nature
> expected to lag. In the best case, this lag is measured in minutes,
> but can climb to hours occasionally. This is a case of prioritizing
> stability and correctness (ingesting all edits) over update latency.
> And while we can work to reduce the maximum latency, this will still
> be an asynchronous process and needs to be considered as such.
>
> We currently have one Blazegraph expert working with us to address a
> number of performance and stability issues. We
> are planning to hire an additional engineer to help us support the
> service in the long term. You can follow our current work in phabricator
> [2].
>
> If anyone has experience with scaling large graph databases, please
> reach out to us, we're always happy to share ideas!
>
> Thanks all for your patience!
>
>    Guillaume
>
> [1]
> https://wikitech.wikimedia.org/wiki/Wikidata_query_service/ScalingStrategy
> [2] https://phabricator.wikimedia.org/project/view/1239/
>
> --
> Guillaume Lederrey
> Engineering Manager, Search Platform
> Wikimedia Foundation
> UTC+2 / CEST
>
> _______________________________________________
> Wikidata mailing list
> wikid...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 7 Jun 2019 08:57:38 +0200
> From: Kiril Simeonovski <kiril.simeonov...@gmail.com>
> To: Research into Wikimedia content and communities
>         <wiki-research-l@lists.wikimedia.org>
> Subject: [Wiki-research-l] Database of all users
> Message-ID:
>         <
> cabuehm5mfdeo7sjrpmw_ak-mpd2qh0c2jygnou9odb5ytut...@mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Dear all,
>
> I was wondering if there is a way to extract a database of all users (or
> selection of users according to some criteria) with their contributions to
> the Wikimedia projects until a fixed point of time from the XTools.
>
> Thank you.
>
> Best regards,
> Kiril
>
>
> ------------------------------
>
> Message: 3
> Date: Fri, 7 Jun 2019 10:53:30 +0300
> From: "Federico Leva (Nemo)" <nemow...@gmail.com>
> To: Research into Wikimedia content and communities
>         <wiki-research-l@lists.wikimedia.org>, Kiril Simeonovski
>         <kiril.simeonov...@gmail.com>
> Subject: Re: [Wiki-research-l] Database of all users
> Message-ID: <33f8a998-2144-1d49-5347-8c59018e2...@gmail.com>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Kiril Simeonovski, 07/06/19 09:57:
> >   with their contributions to
> > the Wikimedia projects
>
> Do you mean the *number* of their contributions, or literally all their
> contributions? Filtering the stub dumps would be one systematic way to
> get all the metadata about edits.
>
> If you just need aggregate numbers with some filter by date, namespace
> or other, the fastest way is probably to write a script which loops
> through all the databases on Labs. For instance I made this to list the
> users who contribute in a certain language, to find translators for very
> small languages:
> <
> https://gerrit.wikimedia.org/r/plugins/gitiles/labs/tools/lists/+/master/scripts/userslang.py
> >
>
> Federico
>
>
>
> ------------------------------
>
> Message: 4
> Date: Fri, 7 Jun 2019 09:57:45 +0200
> From: Kiril Simeonovski <kiril.simeonov...@gmail.com>
> To: "Federico Leva (Nemo)" <nemow...@gmail.com>
> Cc: Research into Wikimedia content and communities
>         <wiki-research-l@lists.wikimedia.org>
> Subject: Re: [Wiki-research-l] Database of all users
> Message-ID:
>         <CABuEHm7ahWx9P=
> xa_km1s+q3z0wkohaxcounfx3asa_cfnv...@mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Hi Federico,
>
> Thanks for the straightforward answer. My idea is to extract the number of
> contributions across projects and namespaces.
>
> Best,
> Kiril
>
> On Fri, Jun 7, 2019 at 9:53 AM Federico Leva (Nemo) <nemow...@gmail.com>
> wrote:
>
> > Kiril Simeonovski, 07/06/19 09:57:
> > >   with their contributions to
> > > the Wikimedia projects
> >
> > Do you mean the *number* of their contributions, or literally all their
> > contributions? Filtering the stub dumps would be one systematic way to
> > get all the metadata about edits.
> >
> > If you just need aggregate numbers with some filter by date, namespace
> > or other, the fastest way is probably to write a script which loops
> > through all the databases on Labs. For instance I made this to list the
> > users who contribute in a certain language, to find translators for very
> > small languages:
> > <
> >
> https://gerrit.wikimedia.org/r/plugins/gitiles/labs/tools/lists/+/master/scripts/userslang.py
> > >
> >
> > Federico
> >
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
> ------------------------------
>
> End of Wiki-research-l Digest, Vol 166, Issue 4
> ***********************************************
>
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to