On 01/03/16 18:04, Lars Kurth wrote:
Daniel, Jesus,
I am going to break my comments down into different sections to make this more
consumable. Let's focus on the A1-A3 use-cases in this mail.
First I wanted to start of with some questions about definitions, as I am
seeing some discrepancies in some of the data shown and am trying to understand
exactly what the data means, and then have a look at the individual sections.
General, to the Xen-A1.A2.A3 dash board
- I played with some filters and noticed some oddities, e.g. if I filter on "merged:
0" all views change as expected
- If I filter on "merged: 1", a lot of widgets show no data. Is this showing
that there is an issue with the data somewhere?
- I see similar issues with other filters, e.g. 'emailtype: "patch"'
In order to bring some context to the dataset, ElasticSearch was
initially used for parsing Apache logs. That means that data should be
formatted as 'a row = an event'.
In this dataset there are several events that are defined by the field
'EmailType'. 'patchserie', 'patch', 'comment', 'flag'. And then,
depending on that 'EmailType', each of the columns may have some meaning
or some other.
This structure uses the 'EmailType' as the central key where the rest of
the columns provide extra syntax. For instance, post_ack_comment field
only makes sense for the EmailType:comment.
Coming back to the comments:
There are fields that apply only to specific type of events. In the case
of 'merge' this applies only in the case of patches. merge:1 would
filter patches that are merged (so the rest of the information is
literally removed as they are not merged). If we filter by merge:0,
these are the rest of the information (even including flags).
Thus, using the filter merge:1 leads to having info only related to
'patches' in this case.
As this panel shows information about other types than 'patch', if you
filter by some 'emailtype' such as 'patch' then you're focusing only on
patches data and this will display the merged and not merged ones.
In order to improve this, we can either create a panel for type of
analysis (one panel for patches, one for comments, etc). Or we can play
with adding the 'merge' field to any flag, patchserie, patch and comment
whose patch was merged at some point. The latter may sound a bit weird
as a 'merged' status does not apply to a flag (Reviewed-by) for instance.
On 1 Mar 2016, at 13:53, Lars Kurth <lars.kurth....@gmail.com> wrote:
Case of Study A.1: Identify top reviewers (for both individuals and companies)
------------------------------------------------------------------------------
Goal: Highlight review contributions - ability to use the data to "reward" review
contributions and encourage more "review" contributions
The widgets in question are:
- Evolution 'Reviewed by-flag' (no patchseries, no patches)
- What is the difference to Evolution of patches
- Top People/Domains Reviewing patches
Q1: Are this the reviewed-by flags?
They are only the Reviewed-by flags.
Q2: What is the scope? Do the number count
- the # files someone reviewed
- the # patches someone reviewed
- the # series someone reviewed
The number counts the number of reviews accomplished by a developer or
by a domain. A review is accomplished when the flag 'reviewed-by' is
detected in a email replying a patch.
If a developer reviews several patches or several versions of the same
patch, each of those is counted as a different review.
If a reviewer is solely defined by the reviewed-by tags, the data does not
provide a correct picture.
This is how this works so far.
It may be better to use the following definition (although, others may disagree)
A reviewer is someone who did one of the following for a patch or series:
- Added a reviewed-by flag
- Added an acked-by flag (maintainers tend to use acked-by)
- Made a comment, but is NOT the author
We can update that definition. Do we want to have extra discussion with
this respect?
Related to that use-case are also the following widgets
- Evolution of Comments Activity
- Top People/Domains Commenting (which also contain post-ACK comments and are
thus also related to A.3)
- Evolution of Email activity
Q3: Again, the scope isn't quite clear
This is the number of comments replying to a patch. A comment is defined
as an email reply to a patch.
Q4: The figures are higher than those in "People/Domains Reviewing patches".
Are comments on people's own patches included (these would be replies to the comments of
others)
I should check the last question. I'd say that we're including them, as
they are 'comments' to a patch. You can indeed comment your own patches
:). But we can deal with this if this does not make sense.
Possible places where this could be added : a separate table which is not time
based, but can be filtered by time
Possible metrics: number of review comments by person, number of patches/patch
series a person is actively commenting on, number of ACKS and reviewed by tags
submitted by person
Actions: we could try to have a panel only focused on rewarding people and only
based on information per domain and individual with some large table at the end
with all of the reviews.
I don't have a strong view, but I think that there are too many tables and
graphs and that they could possibly be consolidated. For example
- The "Top People/Domains Reviewing Patches" views are a subset of the
imbalance tables. In particular exporting the data would be useful for me.
- Personally, I don't mind just having the superset.
- In particular, as the tables are sortable
- And the only filters that can be added are sender and sender_domain
Each set of evolutionary chart and the two tables are based on a
different 'search' in ElasticSearch that is in the end like a 'view' in sql.
So the consolidation may require extra work here not that easy to
change, although we can discuss about this. This is divided by use
cases, so consolidating this may mean consolidate the use cases in first
place.
- Depending on the answer of Q2, the "Evolution 'Reviewed by-flag'..." and
"Evolution of patches..." could probably be shown as a combined bar graph
- If they have the same scope, then a bar chart may be better
- I will probably have some further thoughts about this, based on the answer.
In the case of patches, this is the number of patches. The several
versions of a patch are counted as a new patch in this case. We assumed
that patches re-sent were those that required extra work, so they could
be counted as a new patch. Regarding to the reviews, this is a similar
approach, tables and the evolutionary chart shows the number of total
reviews that were detailed in the specific email.
Case of Study A.2: Identify imbalances between reviewing and contribution
------------------------------------------------------------------------
Context: We suspect that we have some imbalances in the community, aka
- some companies/individuals which primarily commit patches, but not review code
- some companies/individuals which primarily comment on patches, but do not
write code
- some which do both
I think this works quite fine, and the data is consistent with A.1
The only comment I would have is that we should calculate the Balance by
Reviews - Patches posted
We can changed that.
Goal: Highlight imbalances
Possible places where this could be added : a separate table which is not time
based, but can be filtered by time or some histogram
Possible metrics: number of patches/patch series a person/company is actively
commenting on divided by number of patches/patch series a person/company is
actively submitting
Actions: build this dashboard with pre-processed information about the balances.
This seems to be quite good: the only observation is that authors (and domains)
are case sensitive and probably should be normalised
There also seem to be entries such as "Jan Beulich [mailto:jbeul...@suse.com]"
I also found lots of entries, with multiple e-mail addresses such as
- "andrew.coop...@citrix.com; ian.campb...@citrix.com; wei.l...@citrix.com;
ian.jack...@eu.citrix.com; stefano.stabell...@eu.citrix.com; Dong, Eddie; Nakajima, Jun;
Tian, Kevin; xen-devel@lists.xen.org; k...@xen.org"
- Most of these have (0,0,0) and can probably be removed, if that's possible
This needs some cleaning actions, thanks for the pointer, I'll have a
look at this!
Case of Study A.3: identify post ACK-commenting on patches
----------------------------------------------------------
Background: We suspect that post-ACK commenting on patches may be a key issue
in our community. Post-ACK comments would be an indicator for something having
gone wrong in the review process.
Goal:
- Identify people and companies which persistently comment post ACK
- Potentially this could be very powerful, if we had a widget such as a pie
chart which shows the proportion of patches/patch series with no post-ACK
comments vs. with post-ACK comments
I think that
- Evolution of Comments Activity
- Top People/Domains Commenting (which also contain post-ACK comments and are
thus also related to A.3)
- Evolution of Email activity
- _emailtype views : not quite sure what the difference is
serves two purposes: it contributes to A.1, but also to A.3
Related to this seem to be
- Evolution of Flags
- Top People/Domain analysis
- Evolution of Patch series
Q5: What are All Flags? This seems awfully high: maybe signed off?
Something doesn't quite seem to add up, e.g. if I look at March 16th, I get
- Patch e-mails = 1851
- All flags = 1533
- Reviewed by = 117
- Acked by = 150
- Comments = 480
All flags seems to be tracking Patch e-mails
Flags are found in some replies to some patches. When a flag is found in
an email, there's a new entry in the list of flags. If there were three
flags (eg signed-off, cc and reported-by) those are three new entries in
the table.
In this case, 'all flags' is the aggregation of all of the flags found.
Each of those special flags counts.
Said this, we can reduce the flags in the dataset down to the number of
flags of interest for the analysis: reviewed-by and acked-by. This flags
info contains richer information.
Q6: Same question about the scope. The background is whether we can consolidate
some of these, and what we don't need.
NOTE: Need to check the data. It seems there are many post-ACK comments in a 5
year view, but none in the last year. That seems wrong.
However it seems that the data for post-ACK comments may be wrong: in the last
year, there were 0 post-ACK comments. That is clearly wrong.
This sounds like a bug. I'll review this. Thanks again for this pointer.
- AND if that could be used to see how all the other data was different if one
or the other were selected
- In addition being able to get a table of people/companies which shows data by
person/company such as: #of comments post-ACK, #of patches/series impacted by
post-ACK comments and then being able to get to those series would be incredible
This seems to not be there yet. If there was a filter (manual or through some
widget) that would be good. But I guess we need to look at the data first.
With the current data and for this specific case, a work around would be
to write in the search box: "sender_domain:citrix.com AND
post_ack_comment:1". This provides a list of comments in one of the
bottom tables ( so far wrong data as you mentioned regarding to no
existing post-ack comments in 2015). There you can see that there's a
patchserie_id that can be later used for tracking those patchseries
affected by post_ack_comments.
Case of Study A.0: with the people focused dashboard as it is
-------------------------------------------------------------
NOTE: this view/use-case is not yet shown in the current dashboard: it very
much focusses on the analysis in
https://github.com/dicortazar/ipython-notebooks/blob/master/projects/xen-analysis/Code-Review-Metrics.ipynb
Context: From 'Comments per Domain': select a compay with high patch to comment
and selecct one with a high number. Then use other diagrams to check for any
bad interactions between people. This seems to be powerful.
Required Improvements:
- filter by ACKs adding a company table that lists number of ACKs and time to
ACK.
- filter by average number of patch version revisions by person and/or company.
More context: 'Selecting a time period for Patch Time to comment' and then
repeating the above is very useful. Going to peaks of the time to merge helped
to drill down to the cause of the issue.
Actions: we could probably improve this panel with information about ACKs.
Given, that the Xen-A1.A2.A3 dash board is quite busy, we should keep that
separate.
I still have to upload this, sorry for the delay!
This is a summary of the actions unless extra comments are provided:
* Balance should be calculated as reviews - patches
* Cleaning actions in the dataset when finding multiple email addresses
* Bugs with the post-ack comments
* Add the extra panel defined as use case A.0
Some extra feedback or discussion:
* Reduce the flags to be used. We're currently using all of the flags
available and sub-setting the reviewed-by and acked-by
* I'm not sure if we need extra discussion related to the merge:1 filter
* Check how reviews and patches are counted. Any new version of a patch
is counted as a new patch. Any new reviewed-by flag in a reply to a
patch is counted as a review.
Regards,
Daniel.
--
Daniel Izquierdo Cortazar, PhD
Chief Data Officer
---------
"Software Analytics for your peace of mind"
www.bitergia.com
@bitergia
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel