Yup, they do show up in the stubs. I checked the four Swedish dumps. I
left a comment there at
https://phabricator.wikimedia.org/T103670#1432521

Let me know if there's anything else. Thanks!

On Mon, Jul 6, 2015 at 1:22 PM, Ariel T. Glenn <agl...@wikimedia.org> wrote:
> Στις 04-07-2015, ημέρα Σαβ, και ώρα 23:26 -0400, ο/η gnosygnu έγραψε:
>> Hi. I've noticed that some June XML data dumps have duplicate <page>
>> records, usually at the end of the dump.
>>
>> Anyone know if this is intentional? One or two duplicate records is
>> benign, but I'm slightly concerned that it may be a symptom of a
>> larger problem. I've been working with the XML data dumps for over 3
>> years, and haven't seen this before.[1]
> This was reported by another user also.  See phab task T103670 for the
> report.  Did you notice if the stub dumps contain those same duplicate
> entries?
>
> In any case this is an error, and I need to make sure we are fixed for
> the next month's run.
>
> Ariel
>
>> I list some examples below. They're only from the Swedish wikis and
>> Spanish Wikipedia (which is what I started looking at this week) Let
>> me know if you need any other info, and I'll be happy to provide.
>>
>> Finally, for questions like these, is it best to email the mailing
>> list, create a task in Phabricator or do both?
>>
>> Thanks.
>>
>> [1]: It may have started as recently as 2015 April. I stopped looking
>> at dumps shortly before the May problems with the dump server.
>>
>> ----
>>
>> Example 1:
>> URL: http://dumps.wikimedia.org/svwikiversity/20150602/svwikiversity
>> -20150602-pages-articles.xml.bz2
>> Title: Audi m8
>> ID: 18942
>> SHA1: gd16v3qkmjr2w2j35zhqitjfg97igjt)
>> Note: Last article in dump. Repeated twice
>>
>> Example 2:
>> URL: http://dumps.wikimedia.org/svwikiquote/20150602/svwikiquote
>> -20150602-pages-articles.xml.bz2
>> Title: Sommarens tolv månader
>> ID: 6209
>> SHA1: 9yibnev7pn3atxicayjoay0ave7pcu6
>> Note: Last article in dump. Repeated twice
>>
>> Example 3:
>> URL: http://dumps.wikimedia.org/svwikibooks/20150602/svwikibooks
>> -20150602-pages-articles.xml.bz2
>> Title: Topologi/Metriska rum
>> ID: 10001
>> SHA1: 5zdkpxflzdxhy7gxclludnlasvl6tw3
>> Note: Last article in dump. Repeated twice
>>
>> Example 4:
>> URL: http://dumps.wikimedia.org/svwikisource/20150602/svwikisource
>> -20150602-pages-articles.xml.bz2
>> Title: Afhandling om svenska stafsättet/4
>> ID: 88768
>> SHA1: 7zyj208ur4vit0t41z7xlftlyl69bo7
>> Note: Last article in dump. Repeated twice
>>
>> Example 5:
>> URL: http://dumps.wikimedia.org/eswiki/20150602/eswiki-20150602-pages
>> -articles.xml.bz2
>> Title (1): Veguer
>> Title (2): Promo
>> Note: duplicates are earlier in the dump (Veguer at the 9% mark and
>> Promo at the 23% mark). There doesn't seem to be a dupe at the end of
>> the article.
>>
>> Unaffected:
>> * http://dumps.wikimedia.org/svwiki/20150602/svwiki-20150602-pages
>> -articles.xml.bz2
>> * http://dumps.wikimedia.org/svwiktionary/20150603/svwiktionary
>> -20150603-pages-articles.xml.bz2
>> * http://dumps.wikimedia.org/svwikinews/20150602/svwikinews-20150602
>> -pages-articles.xml.bz2
>>
>> _______________________________________________
>> Xmldatadumps-l mailing list
>> Xmldatadumps-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

Reply via email to