On Wed, Jul 22, 2009 at 2:37 PM, Tei <oscar.vi...@gmail.com> wrote: > On Wed, Jul 22, 2009 at 5:48 PM, Chengbin Zheng<chengbinzh...@gmail.com> > wrote: > ... > > > > Yes, the "TombRaider" version is exactly the version I want for static > > HTML. > > > > Just curious, is > > pages-articles.xml.bz2< > http://download.wikimedia.org/enwiki/20090713/enwiki-20090713-pages-articles.xml.bz2 > > > > like > > a "TombRaider" version? If not, what's the difference? > > > > And another curiosity, at > > http://en.wikipedia.org/wiki/Wikipedia:TomeRaider_database, it says the > > English Wikipedia database is only 3.3GB. Did they use compression? That > > seems awfully small. Even if they did, that's an incredible compression > > ratio, similar to 7-zip, I don't know how you can do that on a eBook > format. > > NTFS compression only brings size down 50%. > > At a point, Brion compressed it to 242 MB. > > http://www.mail-archive.com/wikitech-l@lists.wikimedia.org/msg00358.html > > You may also read this: > http://en.wikipedia.org/wiki/Solid_compression > > > -- > -- > ℱin del ℳensaje. > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l >
I have no doubt that you can compress it to 3.3GB. I'm just curious how that's possible for an eBook format. 3.3GB, does it include skin, proper format of Wikipedia, etc? I'm assuming that the pages-articles.xml.bz2 XML dump includes something else other than the raw articles? What else are in it? _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l