On Fri, Jan 5, 2024 at 5:03 PM Wurgl <heisewu...@gmail.com> wrote:
> Hello!
> I am having some unexpected messages, so I tried the following:
> curl -s 
> https://dumps.wikimedia.org/wikidatawiki/latest/wikidatawiki-latest-pages-articles-multistream.xml.bz2
>  | bzip2 -d | tail
> an got this:
> bzip2: Compressed file ends unexpectedly;
>         perhaps it is corrupted?  *Possible* reason follows.
> bzip2: Inappropriate ioctl for device
>         Input file = (stdin), output file = (stdout)
> It is possible that the compressed file(s) have become corrupted.

The file I received was fine and the sha1sum matches that of
wikidatawiki-20240101-pages-articles-multistream.xml.bz2 mention in
the posting of Xabriel Collazo Mojica:

--- 8< ---
$ sha1sum wikidatawiki-latest-pages-articles-multistream.xml.bz2
--- >8 ---

bunzip2 did not report any problem, however, my first attempt to
decompress ended with a full disk after more that 2.3 TB of xml.

The second attempt
--- 8< ---
$  bunzip2 -cv wikidatawiki-latest-pages-articles-multistream.xml.bz2
| tail -n 10000 >
  wikidatawiki-latest-pages-articles-multistream.xml.bz2: done
--- >8 ---

resulted in nice XML fragment which ends with
--- 8< ---
      <comment>/* wbsetclaim-create:2||1 */ [[Property:P2789]]:
      <text bytes="2535" xml:space="preserve">...</text>
--- >8 ---

So, I assume, your curl did not return the full 142 GB of
wikidatawiki-latest-pages-articles-multistream.xml.bz2 .

P.S.: I'll start a new bunzip2 to a larger scratch disk just to find
out, how big this xml file really is.

regards, Gerhard
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org

Reply via email to