Given that the request logs aren't transparent about which cached version of a page is being provided I'm finding it pretty difficult to see how they'd help you answer interesting questions here :/.
On 20 September 2014 04:02, Pine W <wiki.p...@gmail.com> wrote: > A few more thoughts: > > * You probably don't need the full URLs of the content being accessed, so > those could be anonymized and replaced with random identifiers to some > degree, right? > > * Someone might be able to monitor the user's end of the transactions, > such as by having university network logs that show destination domains and > timestamps, in such a way that they could pair the university logs with > Wikimedia access traces of one second granularity and thus defeat some > measures of privacy for the university's Wikimedia users, correct? > > * I am not sure that the staff time required to analyze this request and > produce the data is a good use of resources on Wikimedia's end. Toby would > be a good person to ask about this. > > Pine > On Sep 20, 2014 12:45 AM, "Pine W" <wiki.p...@gmail.com> wrote: > >> Thanks for the explanation. On moderate to high traffic pages, let's say >> with a minimum of 10 hits per minute across the entire time span studied, >> perhaps the requested data could be provided while still providing strong >> privacy protection. Toby might need to discuss this with WMF Legal. >> >> Pine >> On Sep 19, 2014 4:57 AM, "Valerio Schiavoni" <valerio.schiav...@gmail.com> >> wrote: >> >>> Hello everyone, >>> it seems the discussion is sparkling an interesting debate, thanks to >>> everyone. >>> >>> To put back things in context, we use Wikipedia as one of the few >>> websites where users can access different 'versions' of the same page. >>> Users mostly read the most recent version of a given page, but from time >>> to time, read accesses to the 'history' of a page happens. >>> New versions of a page are created as well. Finally, users might >>> potentially need to explore several old versions of a given web page, for >>> example by accessing the details of its history[1]. >>> Access traces need to be accurate to model the workload on the servers >>> that are storing the contents being served the web serves. >>> A resolution bigger than 1 second would not reflect the access patterns >>> on Wikipedia, or similarly versioned, web sites. >>> We use these access patterns to test different version-aware storage >>> techniques. >>> For those interested, I could send the pre-print version of an article >>> that >>> I will present next month at the IEEE SRDS'14 conference. >>> >>> For what concern potential privacy concerns about disclosing such >>> traces, I would like to stress that we are not looking into 'who' or from >>> 'where' a given URL was requested. Those informations are completely absent >>> from the Wikibench traces, and can/should remain such in new traces. >>> >>> Let's say Wikipedia somehow reveals the top-10 most-visited pages in the >>> last minute: would that represent a privacy breach for some users? I hardly >>> doubt so, and I invite the audience to convince me about the contrary. >>> >>> Best regards, >>> Valerio >>> >>> 1- For example: >>> http://it.wikipedia.org/w/index.php?title=George_W._Bush&action=history >>> >>> On Fri, Sep 19, 2014 at 8:36 AM, Pine W <wiki.p...@gmail.com> wrote: >>> >>>> Let's loop back to the request at hand. Valerio, can you describe your >>>> use case for access traces at intervals shorter than one hour? The very >>>> likely outcome of this discussion is that the access traces at shorter >>>> intervals will not be made available, but I'm curious about what you would >>>> do with the data if you had it. >>>> >>>> Pine >>>> On Sep 18, 2014 4:55 PM, "Richard Jensen" <rjen...@uic.edu> wrote: >>>> >>>>> the basic issue in sampling is to decide what the target population T >>>>> actually is. Then you weight the sample so that each person in the target >>>>> population has an equal chance w and people not in it have weight zero. >>>>> >>>>> So what is the target population we want to study? >>>>> --the world's population? >>>>> --the world's educated population? >>>>> --everyone with internet access >>>>> --everyone who ever uses Wikipedia >>>>> --everyone who use it a lot >>>>> --everyone who has knowledge to contribute in positive fashion? >>>>> --everyone who has the internet, skills and potential to contribute? >>>>> --everyone who has the potential to contribute but does not do so? >>>>> >>>>> Richard Jensen >>>>> rjen...@uic.edu >>>>> >>>>> >>>>> _______________________________________________ >>>>> Wiki-research-l mailing list >>>>> Wiki-research-l@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>>>> >>>> >>>> _______________________________________________ >>>> Wiki-research-l mailing list >>>> Wiki-research-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>>> >>>> >>> >>> _______________________________________________ >>> Wiki-research-l mailing list >>> Wiki-research-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>> >>> > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > -- Oliver Keyes Research Analyst Wikimedia Foundation
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l