The WMF makes available a dump of the search indices from CirrusSearch (the
MediaWiki extension that provides search functionality on WMF wikis) on a
weekly basis. These have been running for many years, but sadly have been
getting slower and slower over time as the relevant datasets have grown. A
few months ago we got to the point where sometimes a weekly dump takes more
than a week to generate. As such we've reworked these dumps to generate in
a slightly different manner.

You can reach out to us if there are difficulties migrating to the
replacement dumps. The best place to provide feedback will be in
https://phabricator.wikimedia.org/T366248.

Changes:
* Old dumps location: https://dumps.wikimedia.org/other/cirrussearch/
* New dumps location: https://dumps.wikimedia.org/other/cirrus_search_index/
* The new dumps are bzip2 compressed, while the old ones were gzip.
* Old dumps were one file per search index. New dumps are one directory per
search index. A directory may have one or more files.
* The content of the files is exactly the same. It's just split across
multiple files for ease of generation.
* The old files can be recreated locally, if needed, by concatenating the
decompressed versions. Something like `bzcat *.json.bz2 > full-dump.json`

We will continue producing the old dumps through November, expecting to
shut them off before the end of the year.

Erik Bernhardson
Search Platform
_______________________________________________
Xmldatadumps-l mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to