[Xmldatadumps-l] Re: Part of pages missing in N0 enterprise dumps

Federico Leva (Nemo) Fri, 18 Mar 2022 05:28:34 -0700

Il 18/03/22 14:04, Erik del Toro ha scritto:

Just wanted to tell you, thathttp://aarddict.org  users and dictionary
creators also stumbled about these missing namespaces and are now
suggesting to continue scraping these. So is scraping the expected
approach?

Thanks for mentioning this. Not sure what you mean by scraping hereexactly: if you mean parsing the wikitext, definitely not; if you meangetting the already-parsed HTML from the REST API, it's acceptable.

https://www.mediawiki.org/wiki/API:REST_API/Reference#Get_HTML

As for HTML dumps, the ZIM files by Kiwix for the French Wiktionaryinclude pages like "Conjugaison:espagnol/aumentar", so that's anotherpossible avenue for bulk imports. I've checked the latest version:

https://download.kiwix.org/zim/wiktionary/wiktionary_fr_all_nopic_2022-01.zim.torrent

Federico
_______________________________________________
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org

[Xmldatadumps-l] Re: Part of pages missing in N0 enterprise dumps

Reply via email to