Re: [Wiki-research-l] [Wikimedia-l] wikipedia access traces ?

Aaron Halfaker Wed, 17 Sep 2014 08:58:04 -0700

I don't think that we keep those logs historically.  analytics-l (CC'd)
might have more insights.


Do we have anything more granular than the hourly view logs available here:
https://dumps.wikimedia.org/other/pagecounts-raw/

On Wed, Sep 17, 2014 at 10:39 AM, Valerio Schiavoni <
valerio.schiav...@gmail.com> wrote:

> Hello Aaron,
> 1 hour is way too coarse.
> Let's say 1 second would be ok.
> Is that available ?
>
> On Wed, Sep 17, 2014 at 5:23 PM, Aaron Halfaker <aaron.halfa...@gmail.com>
> wrote:
>
>> Hi Valerio,
>>
>> The page counts dataset has a time resolution of one hour.  Is that too
>> coarse?  How fine of resolution do you need?
>>
>> On Wed, Sep 17, 2014 at 9:44 AM, Valerio Schiavoni <
>> valerio.schiav...@gmail.com> wrote:
>>
>>> Hello Giovanni,
>>> on second thought, I think the Click dataset won't do either.
>>> I've parsed the smaller sample [1], which is said to be extracted from
>>> the bigger one.
>>>
>>> In that dataset there are ~34k entries related to Wikipedia, but they
>>> look like the following:
>>>
>>> {"count": 1, "timestamp": 1257181201, "from": "en.wikipedia.org", "to":
>>> "ko.wikipedia.org"}
>>>
>>> That is, the log only  reports the host/domain accessed, but not the
>>> specific URL being requested (to be clear, the one in the HTTP request
>>> issued by the client).
>>>
>>> This is what is of main interest to me.
>>>
>>> Thanks for your interest anyway!
>>> Valerio
>>>
>>>
>>> 1 - http://carl.cs.indiana.edu/data/#traffic-websci14
>>>
>>> On Wed, Sep 17, 2014 at 4:24 PM, Valerio Schiavoni <
>>> valerio.schiav...@gmail.com> wrote:
>>>
>>>> Hello Giovanni,
>>>> thanks for the pointer to the Click datasets.
>>>> I'd have to take a look at the complete dataset, to see how much of
>>>> those requests are touching wikipedia.
>>>>
>>>> Then, one of the requirements to access those datas is:
>>>> "The Click Dataset is large (~2.5 TB compressed), which requires that
>>>> it be transferred on a physical hard drive. You will have to provide the
>>>> drive as well as pre-paid return shipment. "
>>>>
>>>> I have to check if this is possible and how long this might take to
>>>> ship and send back an hard-drive from Switzerland.
>>>> I'll let you know !!
>>>>
>>>> Best,
>>>> Valerio
>>>>
>>>> On Wed, Sep 17, 2014 at 4:09 PM, Giovanni Luca Ciampaglia <
>>>> gciam...@indiana.edu> wrote:
>>>>
>>>>> Valerio,
>>>>>
>>>>> I didn't know such data existed. As an alternative, perhaps you could
>>>>> have a look at our click datasets, which contain requests to the Web at
>>>>> large (i.e., not just Wikipedia) generated from within the campus of
>>>>> Indiana University over a period of several months. HTH
>>>>>
>>>>> http://carl.cs.indiana.edu/data/#click
>>>>>
>>>>> Cheers
>>>>>
>>>>> G
>>>>>
>>>>> Giovanni Luca Ciampaglia
>>>>>
>>>>> ✎ 919 E 10th ∙ Bloomington 47408 IN ∙ USA
>>>>> ☞ http://www.glciampaglia.com/
>>>>> ✆ +1 812 855-7261
>>>>> ✉ gciam...@indiana.edu
>>>>>
>>>>> 2014-09-17 9:53 GMT-04:00 Valerio Schiavoni <
>>>>> valerio.schiav...@gmail.com>:
>>>>>
>>>>>> Hello,
>>>>>> just bumping my email from last week, since so far I did not get any
>>>>>> answer.
>>>>>>
>>>>>> Should I consider that dataset to be somehow lost ?
>>>>>>
>>>>>> I've also contacted the researchers who partially released it, but
>>>>>> making it publicly available is tricky for them, due to its size (12 TB),
>>>>>> which might instead be somehow in the norms of the operations taken daily
>>>>>> by Wikipedia servers.
>>>>>>
>>>>>> Thanks again,
>>>>>> Valerio
>>>>>>
>>>>>>>
>>>>>>> On Wed, Sep 10, 2014 at 4:15 AM, Valerio Schiavoni <
>>>>>>> valerio.schiav...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Dear WikiMedia foundation,
>>>>>>>> in the context of a EU research project [1], we are interested in
>>>>>>>> accessing
>>>>>>>> wikipedia access traces.
>>>>>>>> In the past, such traces were given for research purposes to other
>>>>>>>> groups
>>>>>>>> [2].
>>>>>>>> Unfortunately, only a small percentage (10%) of that trace has been
>>>>>>>> made
>>>>>>>> made available (10%).
>>>>>>>> We are interested in accessing the totality of that same trace (or
>>>>>>>> even
>>>>>>>> better, a more recent one, but the same one will do).
>>>>>>>>
>>>>>>>> If this is not the correct ML to use for such requests, could
>>>>>>>> please anyone
>>>>>>>> redirect me to correct one ?
>>>>>>>>
>>>>>>>> Thanks again for your attention,
>>>>>>>>
>>>>>>>> Valerio Schiavoni
>>>>>>>> Post-Doc Researcher
>>>>>>>> University of Neuchatel, Switzerland
>>>>>>>>
>>>>>>>> 1 - http://www.leads-project.eu
>>>>>>>> 2 - http://www.wikibench.eu/?page_id=60
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Wiki-research-l mailing list
>>>>>> Wiki-research-l@lists.wikimedia.org
>>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Wiki-research-l mailing list
>>>>> Wiki-research-l@lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> Wiki-research-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>>
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Re: [Wiki-research-l] [Wikimedia-l] wikipedia access traces ?

Reply via email to