On Fri, Jan 5, 2024 at 5:03 PM Wurgl <heisewu...@gmail.com> wrote: > > Hello! > > I am having some unexpected messages, so I tried the following: > > curl -s > https://dumps.wikimedia.org/wikidatawiki/latest/wikidatawiki-latest-pages-articles-multistream.xml.bz2 > | bzip2 -d | tail > > an got this: > > bzip2: Compressed file ends unexpectedly; > perhaps it is corrupted? *Possible* reason follows. > bzip2: Inappropriate ioctl for device > Input file = (stdin), output file = (stdout) > > It is possible that the compressed file(s) have become corrupted.
The file I received was fine and the sha1sum matches that of wikidatawiki-20240101-pages-articles-multistream.xml.bz2 mention in the posting of Xabriel Collazo Mojica: --- 8< --- $ sha1sum wikidatawiki-latest-pages-articles-multistream.xml.bz2 1be753ba90e0390c8b65f9b80b08015922da12f1 wikidatawiki-latest-pages-articles-multistream.xml.bz2 --- >8 --- bunzip2 did not report any problem, however, my first attempt to decompress ended with a full disk after more that 2.3 TB of xml. The second attempt --- 8< --- $ bunzip2 -cv wikidatawiki-latest-pages-articles-multistream.xml.bz2 | tail -n 10000 > wikidatawiki-latest-pages-articles-multistream_tail_-n_10000.xml wikidatawiki-latest-pages-articles-multistream.xml.bz2: done --- >8 --- resulted in nice XML fragment which ends with --- 8< --- <page> <title>Q124069752</title> <ns>0</ns> <id>118244259</id> <revision> <id>2042727399</id> <parentid>2042727216</parentid> <timestamp>2024-01-01T20:37:28Z</timestamp> <contributor> <username>Kalepom</username> <id>1900170</id> </contributor> <comment>/* wbsetclaim-create:2||1 */ [[Property:P2789]]: [[Q16506931]]</comment> <model>wikibase-item</model> <format>application/json</format> <text bytes="2535" xml:space="preserve">...</text> <sha1>9gw926vh84k1b5h6wnuvlvnd2zc3a9b</sha1> </revision> </page> </mediawiki> --- >8 --- So, I assume, your curl did not return the full 142 GB of wikidatawiki-latest-pages-articles-multistream.xml.bz2 . P.S.: I'll start a new bunzip2 to a larger scratch disk just to find out, how big this xml file really is. regards, Gerhard _______________________________________________ Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org