Hi Valerio,

The page counts dataset has a time resolution of one hour.  Is that too
coarse?  How fine of resolution do you need?

On Wed, Sep 17, 2014 at 9:44 AM, Valerio Schiavoni <
valerio.schiav...@gmail.com> wrote:

> Hello Giovanni,
> on second thought, I think the Click dataset won't do either.
> I've parsed the smaller sample [1], which is said to be extracted from the
> bigger one.
>
> In that dataset there are ~34k entries related to Wikipedia, but they look
> like the following:
>
> {"count": 1, "timestamp": 1257181201, "from": "en.wikipedia.org", "to": "
> ko.wikipedia.org"}
>
> That is, the log only  reports the host/domain accessed, but not the
> specific URL being requested (to be clear, the one in the HTTP request
> issued by the client).
>
> This is what is of main interest to me.
>
> Thanks for your interest anyway!
> Valerio
>
>
> 1 - http://carl.cs.indiana.edu/data/#traffic-websci14
>
> On Wed, Sep 17, 2014 at 4:24 PM, Valerio Schiavoni <
> valerio.schiav...@gmail.com> wrote:
>
>> Hello Giovanni,
>> thanks for the pointer to the Click datasets.
>> I'd have to take a look at the complete dataset, to see how much of those
>> requests are touching wikipedia.
>>
>> Then, one of the requirements to access those datas is:
>> "The Click Dataset is large (~2.5 TB compressed), which requires that it
>> be transferred on a physical hard drive. You will have to provide the drive
>> as well as pre-paid return shipment. "
>>
>> I have to check if this is possible and how long this might take to ship
>> and send back an hard-drive from Switzerland.
>> I'll let you know !!
>>
>> Best,
>> Valerio
>>
>> On Wed, Sep 17, 2014 at 4:09 PM, Giovanni Luca Ciampaglia <
>> gciam...@indiana.edu> wrote:
>>
>>> Valerio,
>>>
>>> I didn't know such data existed. As an alternative, perhaps you could
>>> have a look at our click datasets, which contain requests to the Web at
>>> large (i.e., not just Wikipedia) generated from within the campus of
>>> Indiana University over a period of several months. HTH
>>>
>>> http://carl.cs.indiana.edu/data/#click
>>>
>>> Cheers
>>>
>>> G
>>>
>>> Giovanni Luca Ciampaglia
>>>
>>> ✎ 919 E 10th ∙ Bloomington 47408 IN ∙ USA
>>> ☞ http://www.glciampaglia.com/
>>> ✆ +1 812 855-7261
>>> ✉ gciam...@indiana.edu
>>>
>>> 2014-09-17 9:53 GMT-04:00 Valerio Schiavoni <valerio.schiav...@gmail.com
>>> >:
>>>
>>>> Hello,
>>>> just bumping my email from last week, since so far I did not get any
>>>> answer.
>>>>
>>>> Should I consider that dataset to be somehow lost ?
>>>>
>>>> I've also contacted the researchers who partially released it, but
>>>> making it publicly available is tricky for them, due to its size (12 TB),
>>>> which might instead be somehow in the norms of the operations taken daily
>>>> by Wikipedia servers.
>>>>
>>>> Thanks again,
>>>> Valerio
>>>>
>>>>>
>>>>> On Wed, Sep 10, 2014 at 4:15 AM, Valerio Schiavoni <
>>>>> valerio.schiav...@gmail.com> wrote:
>>>>>
>>>>>> Dear WikiMedia foundation,
>>>>>> in the context of a EU research project [1], we are interested in
>>>>>> accessing
>>>>>> wikipedia access traces.
>>>>>> In the past, such traces were given for research purposes to other
>>>>>> groups
>>>>>> [2].
>>>>>> Unfortunately, only a small percentage (10%) of that trace has been
>>>>>> made
>>>>>> made available (10%).
>>>>>> We are interested in accessing the totality of that same trace (or
>>>>>> even
>>>>>> better, a more recent one, but the same one will do).
>>>>>>
>>>>>> If this is not the correct ML to use for such requests, could please
>>>>>> anyone
>>>>>> redirect me to correct one ?
>>>>>>
>>>>>> Thanks again for your attention,
>>>>>>
>>>>>> Valerio Schiavoni
>>>>>> Post-Doc Researcher
>>>>>> University of Neuchatel, Switzerland
>>>>>>
>>>>>> 1 - http://www.leads-project.eu
>>>>>> 2 - http://www.wikibench.eu/?page_id=60
>>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Wiki-research-l mailing list
>>>> Wiki-research-l@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> Wiki-research-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>>
>>
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to