Many of the things done for the statistical analysis of database dumps should be suitable for parallelization (e.g. break the dump into chunks, process the chunks in parallel and sum the results). You could talk to Erik Zachte. I don't know if his code has already been designed for parallel processing though.
Another option might be to look at the methods for compressing old revisions (is [1] still current?). I make heavy use of parallel processing in my professional work (not related to wikis), but I can't really think of any projects I have at hand that would be accessible and completable in a month. -Robert Rohde [1] http://www.mediawiki.org/wiki/Manual:CompressOld.php On Sun, Oct 24, 2010 at 5:42 PM, Aryeh Gregor <simetrical+wikil...@gmail.com> wrote: > This term I'm taking a course in high-performance computing > <http://cs.nyu.edu/courses/fall10/G22.2945-001/index.html>, and I have > to pick a topic for a final project. According to the assignment > <http://cs.nyu.edu/courses/fall10/G22.2945-001/final-project.pdf>, > "The only real requirement is that it be something in parallel." In > the class, we covered > > * Microoptimization of single-threaded code (efficient use of CPU cache, etc.) > * Multithreaded programming using OpenMP > * GPU programming using OpenCL > > and will probably briefly cover distributed computing over multiple > machines with MPI. I will have access to a high-performance cluster > at NYU, including lots of CPU nodes and some high-end GPUs. Unlike > most of the other people in the class, I don't have any interesting > science projects I'm working on, so something useful to > MediaWiki/Wikimedia/Wikipedia is my first thought. If anyone has any > suggestions, please share. (If you have non-Wikimedia-related ones, > I'd also be interested in hearing about them offlist.) They shouldn't > be too ambitious, since I have to finish them in about a month, while > doing work for three other courses and a bunch of other stuff. > > My first thought was to write a GPU program to crack MediaWiki > password hashes as quickly as possible, then use what we've studied in > class about GPU architecture to design a hash function that would be > as slow as possible to crack on a GPU relative to its PHP execution > speed, as Tim suggested a while back. However, maybe there's > something more interesting I could do. > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l