On Wed, Jan 10, 2024 at 6:19 PM Wurgl <heisewu...@gmail.com> wrote:
> The relevant line is this one: > curl -s > https://dumps.wikimedia.org/wikidatawiki/latest/wikidatawiki-latest-pages-articles-multistream.xml.bz2 > | bzip2 -d | php ~/dumps/wikidata_sitelinks.php > > Yes, I double-checked it on my machine at home and the same type of error > happened. Well, we now know that the xml.bz2 file itself is ok. The usual way to debug this would be to perform each step of the above pipe in isolation, which I more or less did. The xml.bz2 file arrived ok, but I used wget for that and that job alone ran for about 12 hours to retrieve the ~150 GB file. Also, bunzip2 worked for me, as mentioned in an earlier posting and I found the expected closing tag "</mediawiki>" in the last line. So, also at least my bunzip2 (Version 1.0.6, 6-Sept-2010) seems to be ok or ok with that file. As I already mentioned, from the messages in your original mail, I can only venture a guess here, is that you curl -s simply did not retrieve the full file. Try ommitting the -s for a test. regards, Gerhard _______________________________________________ Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org