[Bug 25984] Isolate parser from database dependencies

bugzilla-daemon Sun, 10 Jun 2012 11:13:07 -0700

https://bugzilla.wikimedia.org/show_bug.cgi?id=25984


--- Comment #12 from Ángel González <keis...@gmail.com> 2012-06-10 18:13:02 UTC 
---
Created attachment 10720
  --> https://bugzilla.wikimedia.org/attachment.cgi?id=10720
Patch for 0001-Make-MediaWiki-1.19-fetch-content-from-HTTP.patch

_Vi is also processing to a different format. :)

Did you see http://wiki-web.es/mediawiki-offline-reader/ ?

They are not straight dumps from dumps.wikimedia.org, although the original
idea was that they would eventually be processed like that, publishing the
index file along the xml dumps.
You could use the original files, treating them as a single bucket, but
performance would be horrible with big dumps.

My approach was to use a new database type for reading the dumps, so it doesn't
need an extra process or database.

Admittedly, it targetted the then current MediaWiki 1.13, so it'd need an
update in order to work with current MediaWiki versions (mainly things like new
columns/tables).

Vi, I did some tests with your code using eswiki-20081126 dump. For that
version I store the processed file + categories + indexes in less than 800M. In
your case, the shelve file needs 2.4G (a little smaller than the decompressed
xml dump, 2495471616 vs 2584170611).

I had to perform a number of changes, in the patch to make it apply, to the
interwikis so wikipedia is treated as a local namespace, to paths... Also the
database contains references to /home/vi/usr/mediawiki_sa_vi/w/, but it mostly
works.
The more noticeable problems are that images don't work and redirects are not
followed.
Other features such as categories or special pages are also broken, but I
assume that's expected?

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

[Bug 25984] Isolate parser from database dependencies

Reply via email to