https://bugzilla.wikimedia.org/show_bug.cgi?id=25984
--- Comment #12 from Ángel González <keis...@gmail.com> 2012-06-10 18:13:02 UTC --- Created attachment 10720 --> https://bugzilla.wikimedia.org/attachment.cgi?id=10720 Patch for 0001-Make-MediaWiki-1.19-fetch-content-from-HTTP.patch _Vi is also processing to a different format. :) Did you see http://wiki-web.es/mediawiki-offline-reader/ ? They are not straight dumps from dumps.wikimedia.org, although the original idea was that they would eventually be processed like that, publishing the index file along the xml dumps. You could use the original files, treating them as a single bucket, but performance would be horrible with big dumps. My approach was to use a new database type for reading the dumps, so it doesn't need an extra process or database. Admittedly, it targetted the then current MediaWiki 1.13, so it'd need an update in order to work with current MediaWiki versions (mainly things like new columns/tables). Vi, I did some tests with your code using eswiki-20081126 dump. For that version I store the processed file + categories + indexes in less than 800M. In your case, the shelve file needs 2.4G (a little smaller than the decompressed xml dump, 2495471616 vs 2584170611). I had to perform a number of changes, in the patch to make it apply, to the interwikis so wikipedia is treated as a local namespace, to paths... Also the database contains references to /home/vi/usr/mediawiki_sa_vi/w/, but it mostly works. The more noticeable problems are that images don't work and redirects are not followed. Other features such as categories or special pages are also broken, but I assume that's expected? -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l