On Wed, Mar 11, 2009 at 2:09 AM, Brian <brian.min...@colorado.edu> wrote:
> Sure - creating a lucene index of the entire revision history of all
> wikipedia's for a WikiBlame extension.
>
> More realistically (although I would like to do the above) a natural
> language parse of the current revision of the english wikipedia. Based
> on the supposed availability of this hardware, I'd say it could be
> done in less than a week.
>
> https://wiki.toolserver.org/view/Servers
>
> I have to say the toolserver has grown a lot from that first donated server 
> ^_^

I will confess that this server list is significantly more impressive
than I expected it to be based on historical recollections.

To answer River's question, I would basically agree with Brian.  The
starting point is providing full-text history availability and once
you have that there are a number of different projects (like
wikiblame) which would desire to pull and process every revision in
some way.  Some of the code I've worked with would probably take weeks
to run single-threaded against enwiki, but that can be made practical
if one is willing to throw enough cores at the problem.  From an
exterior point of view it often seems like toolserver is significantly
lagged or tools are going down, and from that I have generally assumed
that it operates relatively close to capacity a lot of the time.
Perhaps that is a bad assumption, and there is in fact plenty of spare
capacity?

-Robert Rohde

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to