On Wed, Jan 10, 2024 at 6:19 PM Wurgl <heisewu...@gmail.com> wrote:

> The relevant line is this one:
>   curl -s 
> https://dumps.wikimedia.org/wikidatawiki/latest/wikidatawiki-latest-pages-articles-multistream.xml.bz2
>  | bzip2 -d | php ~/dumps/wikidata_sitelinks.php
>
> Yes, I double-checked it on my machine at home and the same type of error 
> happened.

Well, we now know that the xml.bz2 file itself is ok.  The usual way
to debug this would be to perform each step of the above pipe in
isolation, which I more or less did.  The xml.bz2 file arrived ok, but
I used wget for that and that job alone ran for about 12 hours to
retrieve the ~150 GB file.  Also, bunzip2 worked for me, as mentioned
in an earlier posting and I found the expected closing tag
"</mediawiki>" in the last line.  So, also at least my bunzip2
(Version 1.0.6, 6-Sept-2010) seems to be ok or ok with that file.

As I already mentioned, from the messages in your original mail, I can
only venture a guess here, is that you curl -s simply did not retrieve
the full file.  Try ommitting the -s for a test.

regards, Gerhard
_______________________________________________
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org

Reply via email to