Gerhad: Thanks for the extra checks! Wolfgang: I can confirm Gerhad's findings. The file appears correct, and ends with the right footer.
On Wed, Jan 10, 2024 at 10:50 AM Gerhard Gonter <ggon...@gmail.com> wrote: > On Fri, Jan 5, 2024 at 5:03 PM Wurgl <heisewu...@gmail.com> wrote: > > > > Hello! > > > > I am having some unexpected messages, so I tried the following: > > > > curl -s > https://dumps.wikimedia.org/wikidatawiki/latest/wikidatawiki-latest-pages-articles-multistream.xml.bz2 > | bzip2 -d | tail > > > > an got this: > > > > bzip2: Compressed file ends unexpectedly; > > perhaps it is corrupted? *Possible* reason follows. > > bzip2: Inappropriate ioctl for device > > Input file = (stdin), output file = (stdout) > > > > It is possible that the compressed file(s) have become corrupted. > > The file I received was fine and the sha1sum matches that of > wikidatawiki-20240101-pages-articles-multistream.xml.bz2 mention in > the posting of Xabriel Collazo Mojica: > > --- 8< --- > $ sha1sum wikidatawiki-latest-pages-articles-multistream.xml.bz2 > 1be753ba90e0390c8b65f9b80b08015922da12f1 > wikidatawiki-latest-pages-articles-multistream.xml.bz2 > --- >8 --- > > bunzip2 did not report any problem, however, my first attempt to > decompress ended with a full disk after more that 2.3 TB of xml. > > The second attempt > --- 8< --- > $ bunzip2 -cv wikidatawiki-latest-pages-articles-multistream.xml.bz2 > | tail -n 10000 > > wikidatawiki-latest-pages-articles-multistream_tail_-n_10000.xml > wikidatawiki-latest-pages-articles-multistream.xml.bz2: done > --- >8 --- > > resulted in nice XML fragment which ends with > --- 8< --- > <page> > <title>Q124069752</title> > <ns>0</ns> > <id>118244259</id> > <revision> > <id>2042727399</id> > <parentid>2042727216</parentid> > <timestamp>2024-01-01T20:37:28Z</timestamp> > <contributor> > <username>Kalepom</username> > <id>1900170</id> > </contributor> > <comment>/* wbsetclaim-create:2||1 */ [[Property:P2789]]: > [[Q16506931]]</comment> > <model>wikibase-item</model> > <format>application/json</format> > <text bytes="2535" xml:space="preserve">...</text> > <sha1>9gw926vh84k1b5h6wnuvlvnd2zc3a9b</sha1> > </revision> > </page> > </mediawiki> > --- >8 --- > > So, I assume, your curl did not return the full 142 GB of > wikidatawiki-latest-pages-articles-multistream.xml.bz2 . > > P.S.: I'll start a new bunzip2 to a larger scratch disk just to find > out, how big this xml file really is. > > regards, Gerhard > _______________________________________________ > Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org > To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org > -- Xabriel J. Collazo Mojica (he/him, pronunciation <https://commons.wikimedia.org/wiki/File:Xabriel_Collazo_Mojica_-_pronunciation.ogg> ) Sr Software Engineer Wikimedia Foundation
_______________________________________________ Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org