The WMF makes available a dump of the search indices from CirrusSearch (the MediaWiki extension that provides search functionality on WMF wikis) on a weekly basis. These have been running for many years, but sadly have been getting slower and slower over time as the relevant datasets have grown. A few months ago we got to the point where sometimes a weekly dump takes more than a week to generate. As such we've reworked these dumps to generate in a slightly different manner.
You can reach out to us if there are difficulties migrating to the replacement dumps. The best place to provide feedback will be in https://phabricator.wikimedia.org/T366248. Changes: * Old dumps location: https://dumps.wikimedia.org/other/cirrussearch/ * New dumps location: https://dumps.wikimedia.org/other/cirrus_search_index/ * The new dumps are bzip2 compressed, while the old ones were gzip. * Old dumps were one file per search index. New dumps are one directory per search index. A directory may have one or more files. * The content of the files is exactly the same. It's just split across multiple files for ease of generation. * The old files can be recreated locally, if needed, by concatenating the decompressed versions. Something like `bzcat *.json.bz2 > full-dump.json` We will continue producing the old dumps through November, expecting to shut them off before the end of the year. Erik Bernhardson Search Platform
_______________________________________________ Xmldatadumps-l mailing list -- [email protected] To unsubscribe send an email to [email protected]
