Hi! Thanks for noticing and sharing. Another known issue with HTML dumps is that it seems that categories and templates are not always extracted: https://phabricator.wikimedia.org/T300124
Mitar On Tue, Apr 5, 2022 at 12:59 PM Jan Berkel <j...@berkel.fr> wrote: > > Hello, > > just a heads-up for anyone using HTML dumps, apart from the missing > namespaces issue already mentioned on this list, there also seem to be entire > pages missing, and some of the included page data is outdated and does not > contain the latest changes. I have no idea how many pages are affected. > > phabricator ticket with more details: > https://phabricator.wikimedia.org/T305407 > > – Jan > _______________________________________________ > Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org > To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org -- http://mitar.tnode.com/ https://twitter.com/mitar_m _______________________________________________ Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org