On Wed, Mar 11, 2009 at 2:40 AM, River Tarnell
<ri...@loreley.flyingparchment.org.uk> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Brian:
>> Sure - creating a lucene index of the entire revision history of all
>> wikipedia's for a WikiBlame extension.
>
>> a natural language parse of the current revision of the english wikipedia.
>
> can you estimate how much resources (disk/cpu/etc) would be needed to create
> and maintain either of these?

A useful baseline to think about might be the database backup dumper.
Proposals that require processing the content of every revision to do
something are structurally similar to what is required to build and
compress a full history dump.  Obviously dump generation is a months
long process for enwiki right now, but if one is going to add a text
service to the toolserver then perhaps there are ways to do that which
would cut down on bottlenecks.

-Robert Rohde

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to