On Wed, Mar 11, 2009 at 2:40 AM, River Tarnell <ri...@loreley.flyingparchment.org.uk> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Brian: >> Sure - creating a lucene index of the entire revision history of all >> wikipedia's for a WikiBlame extension. > >> a natural language parse of the current revision of the english wikipedia. > > can you estimate how much resources (disk/cpu/etc) would be needed to create > and maintain either of these?
A useful baseline to think about might be the database backup dumper. Proposals that require processing the content of every revision to do something are structurally similar to what is required to build and compress a full history dump. Obviously dump generation is a months long process for enwiki right now, but if one is going to add a text service to the toolserver then perhaps there are ways to do that which would cut down on bottlenecks. -Robert Rohde _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l