I'm sure many of you have seen the slashdot story with the same title
as this thread.  It pointed to
http://www.softlab.ntua.gr/~ttsiod/buildWikipediaOffline.html which is
the description of a simple offline Wikipedia reader which runs off
the bzipped dumps.

I for one thought the use of bzip2recover to seek over the bzipped
dumps without decompressing the whole thing was genius.  And I was
even more excited when I found out how simple the source code to
bzip2recover is so small (just one .c file with no includes or anything, see
http://swtch.com/usr/local/plan9/src/cmd/bzip2/bzip2recover.c).

So it should be easy to tweak the bzip2recover program to eliminate
the need to split the file up at all.  I'm kind of surprised if there
isn't already something out there to build and index to seek through
bzip2 files, but I've been looking for one for quite some time and
didn't find anything.  Now I know it's possible though, and apparently
not very difficult.

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to