Re: [Wiki-research-l] data about failed searches

Andrew G. West Thu, 19 Dec 2013 09:21:56 -0800

Greetings,

I am the individual who provided code to Gerard. Towards the Bugzillaentry serving as "blocker" for this and many other inquiries, I willnote that my code fires nightly to obtain one days worth of pageviewstats and does write them to an SQL database. I have been persistentlystoring all pageview statistics for en.wp in this query-able format for2+ years at this point. I then use this data in my research, as well asreports such as [https://en.wikipedia.org/wiki/Wikipedia:Top_5000_pages]and [https://en.wikipedia.org/wiki/Wikipedia:TOPRED]. Is it productionready? Probably not, but it works for me as research code.

My limitations with this are primarily hardware based. I do it all on asingle commodity server that also runs services like [[WP:STiki]]. Thus:(a) I don't particularly have the storage to do all languages/projects.CPU cycles would also become an issue at this scale. It can take up to 3hours to parse in a day's worth of en.wp stats. It could be donequicker, but with my query-driven indices and scalable format, this ishow it goes. (b) I am not in a position to open this as a private orpublic API. It would be trivial to DOS this server with some prettysimple queries (en.wp sees 10 million+ article titles daily, I think, asthis data includes attempted URL accesses that don't exist and there isall types of muck in that regard).

I am not sure what Gerard is chasing in particular with "missingsearches", but regardless, I get an overwhelming amount of requests todo popular pages or redlinks reports for various projects/languages. Mycode could do this by changing a small handful of strings, what isreally needs is a place to run and someone to oversee it. More than adev, this seems to be in the realm of someone like Erik Zachte, not thatI am trying to append to anyone's responsibilities. -AW



On 12/19/2013 06:14 AM, Gerard Meijssen wrote:

Hoi,

As I said, there is software that does basically what we need it to do.
I am asking for access for Magnus so that he can modify that software
and make it more useful.

Waiting for perfection takes too long. The need for this functionality
exists and the arguments are in my initial mail.
Thanks,
        GerardM


On 19 December 2013 12:10, Federico Leva (Nemo) <nemow...@gmail.com
<mailto:nemow...@gmail.com>> wrote:

    Gerard Meijssen, 19/12/2013 12:06:

        Hoi,
        Sorry .. the link [1] and the blog post [2] I wrote when I
        learned about it.
        Thanks,
               Gerard


        [1]
        https://en.wikipedia.org/wiki/__User:West.andrew.g/Popular___redlinks
        <https://en.wikipedia.org/wiki/User:West.andrew.g/Popular_redlinks>
        [2]
        
http://ultimategerardm.__blogspot.nl/2013/11/a-__brilliant-idea-barnstar.html
        
<http://ultimategerardm.blogspot.nl/2013/11/a-brilliant-idea-barnstar.html>


    Ah. Those are not searches, they're direct URL accesses (where
    enabled, wdsearch.js shows wikidata search results for those too).
    So again that would require the good old
    https://bugzilla.wikimedia.__org/show_bug.cgi?id=42259
    <https://bugzilla.wikimedia.org/show_bug.cgi?id=42259> , our usual
    blocker. :( Actual search results misses are something quite harder
    to get.


    Nemo

    _________________________________________________
    Wiki-research-l mailing list
    Wiki-research-l@lists.__wikimedia.org
    <mailto:Wiki-research-l@lists.wikimedia.org>
    https://lists.wikimedia.org/__mailman/listinfo/wiki-__research-l
    <https://lists.wikimedia.org/mailman/listinfo/wiki-research-l>




_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


--
Andrew G. West, PhD
Research Scientist
Verisign Labs - Reston, VA
Website: http://www.andrew-g-west.com

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Re: [Wiki-research-l] data about failed searches

Reply via email to