Greetings,

I am the individual who provided code to Gerard. Towards the Bugzilla entry serving as "blocker" for this and many other inquiries, I will note that my code fires nightly to obtain one days worth of pageview stats and does write them to an SQL database. I have been persistently storing all pageview statistics for en.wp in this query-able format for 2+ years at this point. I then use this data in my research, as well as reports such as [https://en.wikipedia.org/wiki/Wikipedia:Top_5000_pages] and [https://en.wikipedia.org/wiki/Wikipedia:TOPRED]. Is it production ready? Probably not, but it works for me as research code.

My limitations with this are primarily hardware based. I do it all on a single commodity server that also runs services like [[WP:STiki]]. Thus: (a) I don't particularly have the storage to do all languages/projects. CPU cycles would also become an issue at this scale. It can take up to 3 hours to parse in a day's worth of en.wp stats. It could be done quicker, but with my query-driven indices and scalable format, this is how it goes. (b) I am not in a position to open this as a private or public API. It would be trivial to DOS this server with some pretty simple queries (en.wp sees 10 million+ article titles daily, I think, as this data includes attempted URL accesses that don't exist and there is all types of muck in that regard).

I am not sure what Gerard is chasing in particular with "missing searches", but regardless, I get an overwhelming amount of requests to do popular pages or redlinks reports for various projects/languages. My code could do this by changing a small handful of strings, what is really needs is a place to run and someone to oversee it. More than a dev, this seems to be in the realm of someone like Erik Zachte, not that I am trying to append to anyone's responsibilities. -AW


On 12/19/2013 06:14 AM, Gerard Meijssen wrote:
Hoi,

As I said, there is software that does basically what we need it to do.
I am asking for access for Magnus so that he can modify that software
and make it more useful.

Waiting for perfection takes too long. The need for this functionality
exists and the arguments are in my initial mail.
Thanks,
        GerardM


On 19 December 2013 12:10, Federico Leva (Nemo) <nemow...@gmail.com
<mailto:nemow...@gmail.com>> wrote:

    Gerard Meijssen, 19/12/2013 12:06:

        Hoi,
        Sorry .. the link [1] and the blog post [2] I wrote when I
        learned about it.
        Thanks,
               Gerard


        [1]
        https://en.wikipedia.org/wiki/__User:West.andrew.g/Popular___redlinks
        <https://en.wikipedia.org/wiki/User:West.andrew.g/Popular_redlinks>
        [2]
        
http://ultimategerardm.__blogspot.nl/2013/11/a-__brilliant-idea-barnstar.html
        
<http://ultimategerardm.blogspot.nl/2013/11/a-brilliant-idea-barnstar.html>


    Ah. Those are not searches, they're direct URL accesses (where
    enabled, wdsearch.js shows wikidata search results for those too).
    So again that would require the good old
    https://bugzilla.wikimedia.__org/show_bug.cgi?id=42259
    <https://bugzilla.wikimedia.org/show_bug.cgi?id=42259> , our usual
    blocker. :( Actual search results misses are something quite harder
    to get.


    Nemo

    _________________________________________________
    Wiki-research-l mailing list
    Wiki-research-l@lists.__wikimedia.org
    <mailto:Wiki-research-l@lists.wikimedia.org>
    https://lists.wikimedia.org/__mailman/listinfo/wiki-__research-l
    <https://lists.wikimedia.org/mailman/listinfo/wiki-research-l>




_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


--
Andrew G. West, PhD
Research Scientist
Verisign Labs - Reston, VA
Website: http://www.andrew-g-west.com

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to