I'd like to add that the md5 of the *uncompressed* file is cd4eee6d3d745ce716db2931c160ee35 . That's what I got from both the uncompressed 7z and the uncompressed bz2. They matched, whew. Uncompressing and md5ing the bz2 took well over a week. Uncompressing and md5ing the 7z took less than a day.
On Mon, Mar 29, 2010 at 8:16 PM, Tomasz Finc <tf...@wikimedia.org> wrote: > You can find all the md5sums at > > http://download.wikipedia.org/enwiki/20100130/enwiki-20100130-md5sums.txt > > --tomasz > > Anthony wrote: > >> Got an md5sum? >> >> >> On Mon, Mar 29, 2010 at 5:46 PM, Tomasz Finc <tf...@wikimedia.org<mailto: >> tf...@wikimedia.org>> wrote: >> >> I love lzma compression. >> >> enwiki-20100130-pages-meta-history.xml.bz2 280.3 GB >> >> enwiki-20100130-pages-meta-history.xml.7z 31.9 GB >> >> Download at http://tinyurl.com/yeelbse >> >> Enjoy! >> >> --tomasz >> >> Tomasz Finc wrote: >> > Tomasz Finc wrote: >> >> New full history en wiki snapshot is hot off the presses! >> >> >> >> It's currently being checksummed which will take a while for >> 280GB+ of >> >> compressed data but for those brave souls willing to test please >> grab it >> >> from >> >> >> >> >> >> http://download.wikipedia.org/enwiki/20100130/enwiki-20100130-pages-meta-history.xml.bz2 >> >> >> >> >> >> and give us feedback about its quality. This run took just over >> a month >> >> and gained a huge speed up after Tims work on re-compressing ES. >> If we >> >> see no hiccups with this data snapshot, I'll start mirroring it >> to other >> >> locations (internet archive, amazon public data sets, etc). >> >> >> >> For those not familiar, the last successful run that we've seen >> of this >> >> data goes all the way back to 2008-10-03. That's over 1.5 years of >> >> people waiting to get access to these data bits. >> >> >> >> I'm excited to say that we seem to have it :) >> >> >> >> --tomasz >> > >> > We now have an md5sum for >> enwiki-20100130-pages-meta-history.xml.bz2. >> > >> > "65677bc275442c7579857cc26b355ded" >> > >> > Please verify against it before filing issues. >> > >> > --tomasz >> > >> > >> > _______________________________________________ >> > Wikitech-l mailing list >> > Wikitech-l@lists.wikimedia.org >> <mailto:Wikitech-l@lists.wikimedia.org> >> >> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> >> >> _______________________________________________ >> Xmldatadumps-admin-l mailing list >> xmldatadumps-admi...@lists.wikimedia.org >> <mailto:xmldatadumps-admi...@lists.wikimedia.org> >> >> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l >> >> >> > _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l