Yup, they do show up in the stubs. I checked the four Swedish dumps. I left a comment there at https://phabricator.wikimedia.org/T103670#1432521
Let me know if there's anything else. Thanks! On Mon, Jul 6, 2015 at 1:22 PM, Ariel T. Glenn <agl...@wikimedia.org> wrote: > Στις 04-07-2015, ημέρα Σαβ, και ώρα 23:26 -0400, ο/η gnosygnu έγραψε: >> Hi. I've noticed that some June XML data dumps have duplicate <page> >> records, usually at the end of the dump. >> >> Anyone know if this is intentional? One or two duplicate records is >> benign, but I'm slightly concerned that it may be a symptom of a >> larger problem. I've been working with the XML data dumps for over 3 >> years, and haven't seen this before.[1] > This was reported by another user also. See phab task T103670 for the > report. Did you notice if the stub dumps contain those same duplicate > entries? > > In any case this is an error, and I need to make sure we are fixed for > the next month's run. > > Ariel > >> I list some examples below. They're only from the Swedish wikis and >> Spanish Wikipedia (which is what I started looking at this week) Let >> me know if you need any other info, and I'll be happy to provide. >> >> Finally, for questions like these, is it best to email the mailing >> list, create a task in Phabricator or do both? >> >> Thanks. >> >> [1]: It may have started as recently as 2015 April. I stopped looking >> at dumps shortly before the May problems with the dump server. >> >> ---- >> >> Example 1: >> URL: http://dumps.wikimedia.org/svwikiversity/20150602/svwikiversity >> -20150602-pages-articles.xml.bz2 >> Title: Audi m8 >> ID: 18942 >> SHA1: gd16v3qkmjr2w2j35zhqitjfg97igjt) >> Note: Last article in dump. Repeated twice >> >> Example 2: >> URL: http://dumps.wikimedia.org/svwikiquote/20150602/svwikiquote >> -20150602-pages-articles.xml.bz2 >> Title: Sommarens tolv månader >> ID: 6209 >> SHA1: 9yibnev7pn3atxicayjoay0ave7pcu6 >> Note: Last article in dump. Repeated twice >> >> Example 3: >> URL: http://dumps.wikimedia.org/svwikibooks/20150602/svwikibooks >> -20150602-pages-articles.xml.bz2 >> Title: Topologi/Metriska rum >> ID: 10001 >> SHA1: 5zdkpxflzdxhy7gxclludnlasvl6tw3 >> Note: Last article in dump. Repeated twice >> >> Example 4: >> URL: http://dumps.wikimedia.org/svwikisource/20150602/svwikisource >> -20150602-pages-articles.xml.bz2 >> Title: Afhandling om svenska stafsättet/4 >> ID: 88768 >> SHA1: 7zyj208ur4vit0t41z7xlftlyl69bo7 >> Note: Last article in dump. Repeated twice >> >> Example 5: >> URL: http://dumps.wikimedia.org/eswiki/20150602/eswiki-20150602-pages >> -articles.xml.bz2 >> Title (1): Veguer >> Title (2): Promo >> Note: duplicates are earlier in the dump (Veguer at the 9% mark and >> Promo at the 23% mark). There doesn't seem to be a dupe at the end of >> the article. >> >> Unaffected: >> * http://dumps.wikimedia.org/svwiki/20150602/svwiki-20150602-pages >> -articles.xml.bz2 >> * http://dumps.wikimedia.org/svwiktionary/20150603/svwiktionary >> -20150603-pages-articles.xml.bz2 >> * http://dumps.wikimedia.org/svwikinews/20150602/svwikinews-20150602 >> -pages-articles.xml.bz2 >> >> _______________________________________________ >> Xmldatadumps-l mailing list >> Xmldatadumps-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l _______________________________________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l