Hi Valerio, The page counts dataset has a time resolution of one hour. Is that too coarse? How fine of resolution do you need?
On Wed, Sep 17, 2014 at 9:44 AM, Valerio Schiavoni < valerio.schiav...@gmail.com> wrote: > Hello Giovanni, > on second thought, I think the Click dataset won't do either. > I've parsed the smaller sample [1], which is said to be extracted from the > bigger one. > > In that dataset there are ~34k entries related to Wikipedia, but they look > like the following: > > {"count": 1, "timestamp": 1257181201, "from": "en.wikipedia.org", "to": " > ko.wikipedia.org"} > > That is, the log only reports the host/domain accessed, but not the > specific URL being requested (to be clear, the one in the HTTP request > issued by the client). > > This is what is of main interest to me. > > Thanks for your interest anyway! > Valerio > > > 1 - http://carl.cs.indiana.edu/data/#traffic-websci14 > > On Wed, Sep 17, 2014 at 4:24 PM, Valerio Schiavoni < > valerio.schiav...@gmail.com> wrote: > >> Hello Giovanni, >> thanks for the pointer to the Click datasets. >> I'd have to take a look at the complete dataset, to see how much of those >> requests are touching wikipedia. >> >> Then, one of the requirements to access those datas is: >> "The Click Dataset is large (~2.5 TB compressed), which requires that it >> be transferred on a physical hard drive. You will have to provide the drive >> as well as pre-paid return shipment. " >> >> I have to check if this is possible and how long this might take to ship >> and send back an hard-drive from Switzerland. >> I'll let you know !! >> >> Best, >> Valerio >> >> On Wed, Sep 17, 2014 at 4:09 PM, Giovanni Luca Ciampaglia < >> gciam...@indiana.edu> wrote: >> >>> Valerio, >>> >>> I didn't know such data existed. As an alternative, perhaps you could >>> have a look at our click datasets, which contain requests to the Web at >>> large (i.e., not just Wikipedia) generated from within the campus of >>> Indiana University over a period of several months. HTH >>> >>> http://carl.cs.indiana.edu/data/#click >>> >>> Cheers >>> >>> G >>> >>> Giovanni Luca Ciampaglia >>> >>> ✎ 919 E 10th ∙ Bloomington 47408 IN ∙ USA >>> ☞ http://www.glciampaglia.com/ >>> ✆ +1 812 855-7261 >>> ✉ gciam...@indiana.edu >>> >>> 2014-09-17 9:53 GMT-04:00 Valerio Schiavoni <valerio.schiav...@gmail.com >>> >: >>> >>>> Hello, >>>> just bumping my email from last week, since so far I did not get any >>>> answer. >>>> >>>> Should I consider that dataset to be somehow lost ? >>>> >>>> I've also contacted the researchers who partially released it, but >>>> making it publicly available is tricky for them, due to its size (12 TB), >>>> which might instead be somehow in the norms of the operations taken daily >>>> by Wikipedia servers. >>>> >>>> Thanks again, >>>> Valerio >>>> >>>>> >>>>> On Wed, Sep 10, 2014 at 4:15 AM, Valerio Schiavoni < >>>>> valerio.schiav...@gmail.com> wrote: >>>>> >>>>>> Dear WikiMedia foundation, >>>>>> in the context of a EU research project [1], we are interested in >>>>>> accessing >>>>>> wikipedia access traces. >>>>>> In the past, such traces were given for research purposes to other >>>>>> groups >>>>>> [2]. >>>>>> Unfortunately, only a small percentage (10%) of that trace has been >>>>>> made >>>>>> made available (10%). >>>>>> We are interested in accessing the totality of that same trace (or >>>>>> even >>>>>> better, a more recent one, but the same one will do). >>>>>> >>>>>> If this is not the correct ML to use for such requests, could please >>>>>> anyone >>>>>> redirect me to correct one ? >>>>>> >>>>>> Thanks again for your attention, >>>>>> >>>>>> Valerio Schiavoni >>>>>> Post-Doc Researcher >>>>>> University of Neuchatel, Switzerland >>>>>> >>>>>> 1 - http://www.leads-project.eu >>>>>> 2 - http://www.wikibench.eu/?page_id=60 >>>>>> >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Wiki-research-l mailing list >>>> Wiki-research-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>>> >>>> >>> >>> _______________________________________________ >>> Wiki-research-l mailing list >>> Wiki-research-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>> >>> >> > > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > >
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l