Hi everyone, We're pleased to announce the availability of the MediaWiki Content File Exports <https://wikitech.wikimedia.org/wiki/MediaWiki_Content_File_Exports>, a new way to access the unparsed content from Wikimedia’s public wikis in XML format.
*What’s Available:* The exports are provided in two datasets, updated monthly starting generation on the 1st: - mediawiki_content_history <https://dumps.wikimedia.org/other/mediawiki_content_history/> - Full revision history for all pages - mediawiki_content_current <https://dumps.wikimedia.org/other/mediawiki_content_current/> - Latest revision only for each page Both are available per wiki in compressed XML format compatible with MediaWiki’s Special:Export <https://www.mediawiki.org/wiki/Manual:Parameters_to_Special:Export>and the legacy XML dumps. *Why the Change:* The legacy dump infrastructure at https://dumps.wikimedia.org/backup-index.html has struggled to reliably produce XML exports for larger wikis. The Data Engineering team has reimplemented this process to ensure this data is accessible long term. *How to Access:* Files are available at https://dumps.wikimedia.org/other/mediawiki_content_history/ and https://dumps.wikimedia.org/other/mediawiki_content_current/. For any specific monthly export, say, for https://dumps.wikimedia.org/other/mediawiki_content_current/simplewiki/2026-01-01/xml/bzip2/, check first for the SHA256SUMS file to confirm the export is complete before downloading. Full instructions are available at: https://wikitech.wikimedia.org/wiki/MediaWiki_Content_File_Exports *Other Related Changes:* While we’ll continue attempting legacy XML generation for the time being, that path is now deprecated. Note that this deprecation affects only the XML content artifacts. All other SQL dumps of various database tables will continue. Additionally, the publication of artifacts on the legacy infrastructure will be reduced from the current twice per month cadence to once per month. The incremental dumps at https://dumps.wikimedia.org/other/incr/, which were experimental, will be sunset. *Questions?* Please review the FAQ <https://wikitech.wikimedia.org/wiki/MediaWiki_Content_File_Exports#FAQ> on the documentation page. You can also reply here if you have any questions, or follow up at https://phabricator.wikimedia.org/T414389. Best regards, The Data Engineering Team
_______________________________________________ Xmldatadumps-l mailing list -- [email protected] To unsubscribe send an email to [email protected]
