On Wed, May 25, 2011 at 08:20:30AM -0700, Scott James Remnant wrote: > On Wed, May 18, 2011 at 2:33 AM, Evan Dandrea <[email protected]> wrote: > > > To be clear, since it wasn't addressed in my original email, I intend > > to only present percentages of successful and unsuccessful installs. > > > > As discussed externally, and now repeated here for the Technical > Board, this is the part of the proposal I have a problem with. > > All raw data collected by this feature should be public. > > There are three good reasons: > > 1) Transparency. > > Making the raw data available makes it clear that the data > collected is anonymized and non-identifying. Users with concerns can > be shown the raw data URL and can verify for themselves that the data > does not identify them.
I agree that transparency is valuable, but making the raw data available doesn't allow the user to verify that no identifying information was shared or stored. That's pretty hard to do. > 2) Verification. > > With the raw data hidden, and only stats published by yourself, > there is no guarantee of honesty. If you claim that Ubuntu has 97% > successful installs, and somebody doubts that, they cannot go back to > the raw data and verify your results. It doesn't prove that the data is accurate or representative. > 3) Collaboration. > > This, to my mind, is the most important. > > Hiding the raw data, making it available only to yourself, makes > it harder for other developers to collaborate with you. Ubuntu is > still a community project. > > Say a developer wanted to not just look at installer success, but > on the average length of time in the installer, and undertake a > project to reduce it. With the raw data from this available, the > developer can trivially patch to add timestamps to the data set, and > analyse themselves for their project. > > With the data not available, that developer has to start from > scratch, including going though this procedure again. This is a good reason to share raw data. It's useful for people to be able to run their own analyses. That said, the question of whether we share the raw data isn't a deciding factor for me. I think we should do this measurement because it's useful in itself. -- - mdz -- technical-board mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/technical-board
