A few more thoughts: * You probably don't need the full URLs of the content being accessed, so those could be anonymized and replaced with random identifiers to some degree, right?
* Someone might be able to monitor the user's end of the transactions, such as by having university network logs that show destination domains and timestamps, in such a way that they could pair the university logs with Wikimedia access traces of one second granularity and thus defeat some measures of privacy for the university's Wikimedia users, correct? * I am not sure that the staff time required to analyze this request and produce the data is a good use of resources on Wikimedia's end. Toby would be a good person to ask about this. Pine On Sep 20, 2014 12:45 AM, "Pine W" <wiki.p...@gmail.com> wrote: > Thanks for the explanation. On moderate to high traffic pages, let's say > with a minimum of 10 hits per minute across the entire time span studied, > perhaps the requested data could be provided while still providing strong > privacy protection. Toby might need to discuss this with WMF Legal. > > Pine > On Sep 19, 2014 4:57 AM, "Valerio Schiavoni" <valerio.schiav...@gmail.com> > wrote: > >> Hello everyone, >> it seems the discussion is sparkling an interesting debate, thanks to >> everyone. >> >> To put back things in context, we use Wikipedia as one of the few >> websites where users can access different 'versions' of the same page. >> Users mostly read the most recent version of a given page, but from time >> to time, read accesses to the 'history' of a page happens. >> New versions of a page are created as well. Finally, users might >> potentially need to explore several old versions of a given web page, for >> example by accessing the details of its history[1]. >> Access traces need to be accurate to model the workload on the servers >> that are storing the contents being served the web serves. >> A resolution bigger than 1 second would not reflect the access patterns >> on Wikipedia, or similarly versioned, web sites. >> We use these access patterns to test different version-aware storage >> techniques. >> For those interested, I could send the pre-print version of an article >> that >> I will present next month at the IEEE SRDS'14 conference. >> >> For what concern potential privacy concerns about disclosing such traces, >> I would like to stress that we are not looking into 'who' or from 'where' a >> given URL was requested. Those informations are completely absent from the >> Wikibench traces, and can/should remain such in new traces. >> >> Let's say Wikipedia somehow reveals the top-10 most-visited pages in the >> last minute: would that represent a privacy breach for some users? I hardly >> doubt so, and I invite the audience to convince me about the contrary. >> >> Best regards, >> Valerio >> >> 1- For example: >> http://it.wikipedia.org/w/index.php?title=George_W._Bush&action=history >> >> On Fri, Sep 19, 2014 at 8:36 AM, Pine W <wiki.p...@gmail.com> wrote: >> >>> Let's loop back to the request at hand. Valerio, can you describe your >>> use case for access traces at intervals shorter than one hour? The very >>> likely outcome of this discussion is that the access traces at shorter >>> intervals will not be made available, but I'm curious about what you would >>> do with the data if you had it. >>> >>> Pine >>> On Sep 18, 2014 4:55 PM, "Richard Jensen" <rjen...@uic.edu> wrote: >>> >>>> the basic issue in sampling is to decide what the target population T >>>> actually is. Then you weight the sample so that each person in the target >>>> population has an equal chance w and people not in it have weight zero. >>>> >>>> So what is the target population we want to study? >>>> --the world's population? >>>> --the world's educated population? >>>> --everyone with internet access >>>> --everyone who ever uses Wikipedia >>>> --everyone who use it a lot >>>> --everyone who has knowledge to contribute in positive fashion? >>>> --everyone who has the internet, skills and potential to contribute? >>>> --everyone who has the potential to contribute but does not do so? >>>> >>>> Richard Jensen >>>> rjen...@uic.edu >>>> >>>> >>>> _______________________________________________ >>>> Wiki-research-l mailing list >>>> Wiki-research-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>>> >>> >>> _______________________________________________ >>> Wiki-research-l mailing list >>> Wiki-research-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>> >>> >> >> _______________________________________________ >> Wiki-research-l mailing list >> Wiki-research-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >> >>
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l