I don't think that we keep those logs historically. analytics-l (CC'd) might have more insights.
Do we have anything more granular than the hourly view logs available here: https://dumps.wikimedia.org/other/pagecounts-raw/ On Wed, Sep 17, 2014 at 10:39 AM, Valerio Schiavoni < valerio.schiav...@gmail.com> wrote: > Hello Aaron, > 1 hour is way too coarse. > Let's say 1 second would be ok. > Is that available ? > > On Wed, Sep 17, 2014 at 5:23 PM, Aaron Halfaker <aaron.halfa...@gmail.com> > wrote: > >> Hi Valerio, >> >> The page counts dataset has a time resolution of one hour. Is that too >> coarse? How fine of resolution do you need? >> >> On Wed, Sep 17, 2014 at 9:44 AM, Valerio Schiavoni < >> valerio.schiav...@gmail.com> wrote: >> >>> Hello Giovanni, >>> on second thought, I think the Click dataset won't do either. >>> I've parsed the smaller sample [1], which is said to be extracted from >>> the bigger one. >>> >>> In that dataset there are ~34k entries related to Wikipedia, but they >>> look like the following: >>> >>> {"count": 1, "timestamp": 1257181201, "from": "en.wikipedia.org", "to": >>> "ko.wikipedia.org"} >>> >>> That is, the log only reports the host/domain accessed, but not the >>> specific URL being requested (to be clear, the one in the HTTP request >>> issued by the client). >>> >>> This is what is of main interest to me. >>> >>> Thanks for your interest anyway! >>> Valerio >>> >>> >>> 1 - http://carl.cs.indiana.edu/data/#traffic-websci14 >>> >>> On Wed, Sep 17, 2014 at 4:24 PM, Valerio Schiavoni < >>> valerio.schiav...@gmail.com> wrote: >>> >>>> Hello Giovanni, >>>> thanks for the pointer to the Click datasets. >>>> I'd have to take a look at the complete dataset, to see how much of >>>> those requests are touching wikipedia. >>>> >>>> Then, one of the requirements to access those datas is: >>>> "The Click Dataset is large (~2.5 TB compressed), which requires that >>>> it be transferred on a physical hard drive. You will have to provide the >>>> drive as well as pre-paid return shipment. " >>>> >>>> I have to check if this is possible and how long this might take to >>>> ship and send back an hard-drive from Switzerland. >>>> I'll let you know !! >>>> >>>> Best, >>>> Valerio >>>> >>>> On Wed, Sep 17, 2014 at 4:09 PM, Giovanni Luca Ciampaglia < >>>> gciam...@indiana.edu> wrote: >>>> >>>>> Valerio, >>>>> >>>>> I didn't know such data existed. As an alternative, perhaps you could >>>>> have a look at our click datasets, which contain requests to the Web at >>>>> large (i.e., not just Wikipedia) generated from within the campus of >>>>> Indiana University over a period of several months. HTH >>>>> >>>>> http://carl.cs.indiana.edu/data/#click >>>>> >>>>> Cheers >>>>> >>>>> G >>>>> >>>>> Giovanni Luca Ciampaglia >>>>> >>>>> ✎ 919 E 10th ∙ Bloomington 47408 IN ∙ USA >>>>> ☞ http://www.glciampaglia.com/ >>>>> ✆ +1 812 855-7261 >>>>> ✉ gciam...@indiana.edu >>>>> >>>>> 2014-09-17 9:53 GMT-04:00 Valerio Schiavoni < >>>>> valerio.schiav...@gmail.com>: >>>>> >>>>>> Hello, >>>>>> just bumping my email from last week, since so far I did not get any >>>>>> answer. >>>>>> >>>>>> Should I consider that dataset to be somehow lost ? >>>>>> >>>>>> I've also contacted the researchers who partially released it, but >>>>>> making it publicly available is tricky for them, due to its size (12 TB), >>>>>> which might instead be somehow in the norms of the operations taken daily >>>>>> by Wikipedia servers. >>>>>> >>>>>> Thanks again, >>>>>> Valerio >>>>>> >>>>>>> >>>>>>> On Wed, Sep 10, 2014 at 4:15 AM, Valerio Schiavoni < >>>>>>> valerio.schiav...@gmail.com> wrote: >>>>>>> >>>>>>>> Dear WikiMedia foundation, >>>>>>>> in the context of a EU research project [1], we are interested in >>>>>>>> accessing >>>>>>>> wikipedia access traces. >>>>>>>> In the past, such traces were given for research purposes to other >>>>>>>> groups >>>>>>>> [2]. >>>>>>>> Unfortunately, only a small percentage (10%) of that trace has been >>>>>>>> made >>>>>>>> made available (10%). >>>>>>>> We are interested in accessing the totality of that same trace (or >>>>>>>> even >>>>>>>> better, a more recent one, but the same one will do). >>>>>>>> >>>>>>>> If this is not the correct ML to use for such requests, could >>>>>>>> please anyone >>>>>>>> redirect me to correct one ? >>>>>>>> >>>>>>>> Thanks again for your attention, >>>>>>>> >>>>>>>> Valerio Schiavoni >>>>>>>> Post-Doc Researcher >>>>>>>> University of Neuchatel, Switzerland >>>>>>>> >>>>>>>> 1 - http://www.leads-project.eu >>>>>>>> 2 - http://www.wikibench.eu/?page_id=60 >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Wiki-research-l mailing list >>>>>> Wiki-research-l@lists.wikimedia.org >>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Wiki-research-l mailing list >>>>> Wiki-research-l@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>>>> >>>>> >>>> >>> >>> _______________________________________________ >>> Wiki-research-l mailing list >>> Wiki-research-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>> >>> >> >> _______________________________________________ >> Wiki-research-l mailing list >> Wiki-research-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >> >> > > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > >
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l