Re: [Wikitech-l] Complete (basic) analysis of MediaWiki

2012-12-04 Thread Claudia Müller-Birn
Hi Andre,

On Dec 3, 2012, at 7:51 PM, Andre Klapper aklap...@wikimedia.org wrote:

 On Mon, 2012-12-03 at 19:40 +0100, Federico Leva (Nemo) wrote:
 Compare e.g. 
 https://www.ohloh.net/p/mediawiki/contributors?query=sort=commits
 Also, 
 https://bugzilla.wikimedia.org/weekly-bug-summary.cgi?tops=10days=10 
 proves 
 they're not talking of the whole bugzilla but then they don't say which 
 components.
 
 Would be helpful to mention the exact dataset you refer to.
 
 Also I'd rather challenge weekly-bug-summary.cgi's results:
 MediaWiki extensions has 2031 open bugs, and only 1883 have been filed
 in the last 10 days?
 = 148 bug reports got opened more than 274 years ago?
 
 But maybe I fail to read weekly-bug-summary.cgi correctly.

Well, you don't. I think the UI is just misleading because the 10 days are 
just automatically positioned in the table header. The script does not account 
for the real age of the bug. 

The oldest bug with the number 1 has been created by Brion on Aug 10, 2004. 
Between this day and today are 3039 days (including today). Therefore, by 
replacing the number of days in the request, the same result occurs. At least 
in my data, the first bug has been closed on May 22, 2005 which are then 2754 
days... But now it becomes complicated because there are too many changes that 
do not show up in my data (that are based on the bugzilla API). But you get my 
point :)  

However, I very much like the Bitergia stats - very good first step.

 
 andre
 -- 
 Andre Klapper | Wikimedia Bugwrangler
 http://blogs.gnome.org/aklapper/

Claudia

 
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Complete (basic) analysis of MediaWiki

2012-12-04 Thread Jesus M. Gonzalez-Barahona
On Mon, 2012-12-03 at 23:46 +0100, Platonides wrote:
 El 03/12/12 22:45, Jesus M. Gonzalez-Barahona escribió:
  On Mon, 2012-12-03 at 21:04 +0100, Platonides wrote:
  El 03/12/12 19:40, Federico Leva (Nemo) escribió:
  That data is hardly useful, it doesn't explain what it refers to 
  
  I guess I missed your message, Federico.
 
 He forgot to keep you in CC, so it was sent only to the mailing list.

Thanks. I happen to be a subscriber to the list, but I automatically
archive it, so I didn't notice the message. I already saw it.

  I agree a glossary of each term would be useful.
  It took me a while to realise that committers/closers/senders where the
  terms used for users of git/bugzilla/mailing list.
  
  Well, in fact commiters are committers to the git repository (you also
  have authors, see below), closers are specifically ticket closers (you
  also have people opening or changing tickets) and senders are indeed
  senders of mailing list messages. We'll work to make this much more
  clear. Again, thanks for the suggestion.
  
  They should track authors instead of committers, though (preferably
  skipping merge commits)
  
  We do both. In the summary (main) page you have authors in the summary
  (orange) chart, since authors seemed more meaningful than committers.
  Same for the blue chart in that page. In the source code page you have
  committers (orange chart) and both authors and committers (blue charts),
  for a more detailed comparison.
 
 I was specifically thinking in the table of Top committers.
 Also, the summary page has an authors graph but
 http://bitergia.com/public/previews/2012_11_mediawiki/scm.html has a
 committers one.
 
 When the committer is different than the author there are usually two
 options:
 - It was a merge and the committer is 'gerrit'.
 - The patchset was (slightly) changed by the committer from the original
 by the author.
 
 There's also a less common one of committing a patch from a different
 source, such as a bugzilla patch.

Thanks for the info.

 Number of commits by gerrit are meaningless, and committers with little
 changes inflate some numbers but are not too useful. Number of comments
 / approvals in gerrit would be more appropiate than that.
 
 Equally, the author field of merges should IMHO be ignore since that's
 not a commit which really touches the code (could be measured in a
 different statistic), so many commits produce two entries.

You're right, thanks. However, we intended to measure raw commits, as
found in the git repo. One of the filters after this first pass
filters out those commits. You're right that in this case at least,
those numbers would be more appropriate.

  Seems that Jesús did a fine job.
  It could be polished quite more with some local knowledge, merging
  users, hiding bots, etc.
  
  Thanks a lot. We usually go, after this first stage, with that
  identification of bots, unification of identities, identification of
  large commits, classification of different kinds of tickets, etc. In
  this case, we were mainly testing the automated (first) stage: the
  second one, as you mention, usually needs some detailed knowledge about
  the project, and some manual intervention.
 
 Sure. I wasn't intending to put pressure on you.

None taken ;-)

 A few quirks I noticed:
 - nore...@sourceforge.net is abusing its second place as sender (2525).
 - I bet the two brion are the same, with different emails
 (4561+1285=5846, wow!)

We will look into that.

 l10n-bot is indeed a bot.
 On svn localisation commits weren't done with a specific account, but
 you can look for commit messages like «Localisation updates for core and
 extension messages from translatewiki.net»

ok

 Commits migrated from svn will have emails of
 usern...@users.mediawiki.org All commits done on git use a different
 one. Moreover, some people have used different mails (see other threads
 on the mailing list about this in ohloh).

We noticed this. But it is a bit risky to assume that
x...@users.mediawiki.org is the same as xxx@otherdomain: probably in most
cases, the merge is perfect, but in some it could be wrong. We preferred
to provide the raw numbers since we didn't have resources to do the
manual checking needed. But we can try to produce accumulated stats
assuming that match: probably it would be accurate enough.

[...]

Saludos,

Jesus.



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Complete (basic) analysis of MediaWiki

2012-12-04 Thread Jesus M. Gonzalez-Barahona
On Mon, 2012-12-03 at 19:40 +0100, Federico Leva (Nemo) wrote:
[...]
 Also, 
 https://bugzilla.wikimedia.org/weekly-bug-summary.cgi?tops=10days=10 
 proves 
 they're not talking of the whole bugzilla but then they don't say which 
 components.

Our mining is for the MediaWiki product. In particular, the url we're
using is:

https://bugzilla.wikimedia.org/buglist.cgi?product=MediaWiki

If you look in the bicho database available at
http://bitergia.com/public/previews/2012_11_mediawiki/data/db/ you can
count the tickets:

mysql select count(id) from issues;
+---+
| count(id) |
+---+
| 19776 |
+---+

This is consistent with the 19953 tickets that I can see right now in
Bugzilla. 

You're right that it is not obvious that we're only considering this
product, we're fixing that.

Thanks for the advice!

Jesus.



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Complete (basic) analysis of MediaWiki

2012-12-04 Thread Jesus M. Gonzalez-Barahona
On Mon, 2012-12-03 at 19:40 +0100, Federico Leva (Nemo) wrote:
 That data is hardly useful, it doesn't explain what it refers to and, 
 even when it does, seems wrong. Compare e.g. 
 https://www.ohloh.net/p/mediawiki/contributors?query=sort=commits
[...]

ok, some info about this one. It seems Ohloh is counting commits in the
master branch. If you just use the git log to get the main stats:

$ git log --format=format:%ae  Authors
$ grep brion Authors | wc -l
4493
$ grep tstarling Authors | wc -l
2554

which is pretty much what you see in Ohloh.

In our case, we're counting *all* the activity in the repository (all
branches):

$ git log --all --format=format:%ae  Authors
$ grep brion Authors | wc -l
5425
$ grep tstarling Authors | wc -l
3068

Which is pretty much our data.

To be honest, I'm not sure which one (counting only master branch, or
all branches) is better: probably we should be providing both, or even a
separate count for each branch, so that users may decide which data
better suits their needs. I take notice about this.

Again, thanks for pointing it out.

Saludos,

Jesus.




___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Complete (basic) analysis of MediaWiki

2012-12-03 Thread Quim Gil
If last October we got a bunch of MediaWiki developer stats thanks to 
the aggregation of data by Ohloh [1], now we are getting plenty more 
stats from Bitergia, including data from bug reporting and mailing lists:


http://blog.bitergia.com/2012/12/03/complete-basic-analysis-of-mediawiki/

Bitergia is a company based in Madrid formed by a small team of 
developers that have been working on FLOSS stats software for a long 
time. All the tools they develop are free software publicly available 
and open to contributions.


They have been kind enough to contribute some time and work setting up 
stats for the MediaWiki community. They also welcome feedback about the 
service and the data collected. I'm CCing Jesús M. González-Barahona, 
who has been my regular contact for this task in the past weeks.


Al good news for http://www.mediawiki.org/wiki/Community_Metrics !

[1] https://www.ohloh.net/orgs/wikimedia

--
Quim Gil
Technical Contributor Coordinator
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Complete (basic) analysis of MediaWiki

2012-12-03 Thread Federico Leva (Nemo)
That data is hardly useful, it doesn't explain what it refers to and, 
even when it does, seems wrong. Compare e.g. 
https://www.ohloh.net/p/mediawiki/contributors?query=sort=commits
Also, 
https://bugzilla.wikimedia.org/weekly-bug-summary.cgi?tops=10days=10 proves 
they're not talking of the whole bugzilla but then they don't say which 
components.


Nemo

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Complete (basic) analysis of MediaWiki

2012-12-03 Thread Andre Klapper
On Mon, 2012-12-03 at 19:40 +0100, Federico Leva (Nemo) wrote:
 Compare e.g. 
 https://www.ohloh.net/p/mediawiki/contributors?query=sort=commits
 Also, 
 https://bugzilla.wikimedia.org/weekly-bug-summary.cgi?tops=10days=10 
 proves 
 they're not talking of the whole bugzilla but then they don't say which 
 components.

Would be helpful to mention the exact dataset you refer to.

Also I'd rather challenge weekly-bug-summary.cgi's results:
MediaWiki extensions has 2031 open bugs, and only 1883 have been filed
in the last 10 days?
= 148 bug reports got opened more than 274 years ago?

But maybe I fail to read weekly-bug-summary.cgi correctly.

andre
-- 
Andre Klapper | Wikimedia Bugwrangler
http://blogs.gnome.org/aklapper/


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Complete (basic) analysis of MediaWiki

2012-12-03 Thread Platonides
El 03/12/12 19:40, Federico Leva (Nemo) escribió:
 That data is hardly useful, it doesn't explain what it refers to 

I agree a glossary of each term would be useful.
It took me a while to realise that committers/closers/senders where the
terms used for users of git/bugzilla/mailing list.

They should track authors instead of committers, though (preferably
skipping merge commits)

 Also,
 https://bugzilla.wikimedia.org/weekly-bug-summary.cgi?tops=10days=10 
 proves
 they're not talking of the whole bugzilla but then they don't say which
 components.
 
 Nemo

Looking at
http://bitergia.com/public/previews/2012_11_mediawiki/data/db/acs_bicho_mediawiki.sql.bz2
they seem to have obtained data from bugs 1 to 19775. Not that they
skipped bugs based on components.


Seems that Jesús did a fine job.
It could be polished quite more with some local knowledge, merging
users, hiding bots, etc.
I would also change the layout of the summary page, making the graphs
larger and placing the tables below. Plus some cosmetics empty brackets,
missing name...

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Complete (basic) analysis of MediaWiki

2012-12-03 Thread Platonides
El 03/12/12 22:45, Jesus M. Gonzalez-Barahona escribió:
 On Mon, 2012-12-03 at 21:04 +0100, Platonides wrote:
 El 03/12/12 19:40, Federico Leva (Nemo) escribió:
 That data is hardly useful, it doesn't explain what it refers to 
 
 I guess I missed your message, Federico.

He forgot to keep you in CC, so it was sent only to the mailing list.



 I agree a glossary of each term would be useful.
 It took me a while to realise that committers/closers/senders where the
 terms used for users of git/bugzilla/mailing list.
 
 Well, in fact commiters are committers to the git repository (you also
 have authors, see below), closers are specifically ticket closers (you
 also have people opening or changing tickets) and senders are indeed
 senders of mailing list messages. We'll work to make this much more
 clear. Again, thanks for the suggestion.
 
 They should track authors instead of committers, though (preferably
 skipping merge commits)
 
 We do both. In the summary (main) page you have authors in the summary
 (orange) chart, since authors seemed more meaningful than committers.
 Same for the blue chart in that page. In the source code page you have
 committers (orange chart) and both authors and committers (blue charts),
 for a more detailed comparison.

I was specifically thinking in the table of Top committers.
Also, the summary page has an authors graph but
http://bitergia.com/public/previews/2012_11_mediawiki/scm.html has a
committers one.

When the committer is different than the author there are usually two
options:
- It was a merge and the committer is 'gerrit'.
- The patchset was (slightly) changed by the committer from the original
by the author.

There's also a less common one of committing a patch from a different
source, such as a bugzilla patch.


Number of commits by gerrit are meaningless, and committers with little
changes inflate some numbers but are not too useful. Number of comments
/ approvals in gerrit would be more appropiate than that.

Equally, the author field of merges should IMHO be ignore since that's
not a commit which really touches the code (could be measured in a
different statistic), so many commits produce two entries.



 Seems that Jesús did a fine job.
 It could be polished quite more with some local knowledge, merging
 users, hiding bots, etc.
 
 Thanks a lot. We usually go, after this first stage, with that
 identification of bots, unification of identities, identification of
 large commits, classification of different kinds of tickets, etc. In
 this case, we were mainly testing the automated (first) stage: the
 second one, as you mention, usually needs some detailed knowledge about
 the project, and some manual intervention.

Sure. I wasn't intending to put pressure on you.

A few quirks I noticed:
- nore...@sourceforge.net is abusing its second place as sender (2525).
- I bet the two brion are the same, with different emails
(4561+1285=5846, wow!)

l10n-bot is indeed a bot.
On svn localisation commits weren't done with a specific account, but
you can look for commit messages like «Localisation updates for core and
extension messages from translatewiki.net»

Commits migrated from svn will have emails of
usern...@users.mediawiki.org All commits done on git use a different
one. Moreover, some people have used different mails (see other threads
on the mailing list about this in ohloh).



 I would also change the layout of the summary page, making the graphs
 larger and placing the tables below. Plus some cosmetics empty brackets,
 missing name...
 
 This is a very good point, and something we didn't work too much into. I
 take note.
 
 Thanks a lot for the feedback!

You are welcome!




___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l