[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
gerritbot added a comment. Change 286411 merged by Dzahn: Don't publish Wikidata dumps if a shard failed https://gerrit.wikimedia.org/r/286411 TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: gerritbot Cc: Envlh, gerritbot, Stashbot, aude, daniel, Addshore, hoo, Aklapper, Smalyshev, Lewizho99, Maathavan, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
gerritbot added a comment. Change 286434 merged by jenkins-bot: Update Wikidata - fix for rdf dumps https://gerrit.wikimedia.org/r/286434 TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: gerritbot Cc: Envlh, gerritbot, Stashbot, aude, daniel, Addshore, hoo, Aklapper, Smalyshev, Lewizho99, Maathavan, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
gerritbot added a comment. Change 286434 had a related patch set uploaded (by JanZerebecki): Update Wikidata - fix for rdf dumps https://gerrit.wikimedia.org/r/286434 TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: gerritbot Cc: Envlh, gerritbot, Stashbot, aude, daniel, Addshore, hoo, Aklapper, Smalyshev, Lewizho99, Maathavan, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
gerritbot added a comment. Change 286411 had a related patch set uploaded (by JanZerebecki): Don't publish Wikidata dumps if a shared failed https://gerrit.wikimedia.org/r/286411 TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: gerritbot Cc: Envlh, gerritbot, Stashbot, aude, daniel, Addshore, hoo, Aklapper, Smalyshev, Lewizho99, Maathavan, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
Smalyshev added a comment. Great, hopefully before regular dumps start (not sure when they do, I thought on Monday too?). TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev Cc: Envlh, gerritbot, Stashbot, aude, daniel, Addshore, hoo, Aklapper, Smalyshev, Lewizho99, Maathavan, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
aude added a comment. @Smalyshev we can offiically deploy the patch on Monday TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: aude Cc: Envlh, gerritbot, Stashbot, aude, daniel, Addshore, hoo, Aklapper, Smalyshev, Lewizho99, Maathavan, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
Smalyshev added a comment. Extremely weird. I mean I am happy dump is not failing anymore, but I'd still like to know why it was failing in the first place. The fact that the change with state check fixed it confirms that this branch is likely the problem, but I have absolutely no idea *how* this problem is caused... TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev Cc: Envlh, gerritbot, Stashbot, aude, daniel, Addshore, hoo, Aklapper, Smalyshev, Lewizho99, Maathavan, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
aude added a comment. @daniel no errors TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: aude Cc: Envlh, gerritbot, Stashbot, aude, daniel, Addshore, hoo, Aklapper, Smalyshev, Lewizho99, Maathavan, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
daniel added a comment. and no error? huh. then the issue isn't with deduplication. TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: daniel Cc: Envlh, gerritbot, Stashbot, aude, daniel, Addshore, hoo, Aklapper, Smalyshev, Lewizho99, Maathavan, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
aude added a comment. if Q5940875 is listed twice, it gets dumped twice (with the old code) TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: aude Cc: Envlh, gerritbot, Stashbot, aude, daniel, Addshore, hoo, Aklapper, Smalyshev, Lewizho99, Maathavan, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
daniel added a comment. Huh, strange. Can't think of a way that would happen. Actually - what happens if you try to dump Q5940875 twice? If it's in the ID list twice, it gets dumped twice, right? That should trigger any issues related to deduplication. It's nice to see that the patch fixed it, but it would be nice if we could understand what exactly went wrong... TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: daniel Cc: Envlh, gerritbot, Stashbot, aude, daniel, Addshore, hoo, Aklapper, Smalyshev, Lewizho99, Maathavan, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
aude added a comment. when dumping shard 1 with https://gerrit.wikimedia.org/r/286262, I don't get any errors for Q5940875 TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: aude Cc: Envlh, gerritbot, Stashbot, aude, daniel, Addshore, hoo, Aklapper, Smalyshev, Lewizho99, Maathavan, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
gerritbot added a comment. Change 286262 merged by jenkins-bot: Backport change to purtle https://gerrit.wikimedia.org/r/286262 TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: gerritbot Cc: gerritbot, Stashbot, aude, daniel, Addshore, hoo, Aklapper, Smalyshev, Lewizho99, Maathavan, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
gerritbot added a comment. Change 286262 had a related patch set uploaded (by Aude): Backport change to purtle https://gerrit.wikimedia.org/r/286262 TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: gerritbot Cc: gerritbot, Stashbot, aude, daniel, Addshore, hoo, Aklapper, Smalyshev, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
Smalyshev added a comment. https://www.wikidata.org/wiki/Special:EntityData/Q5940875.ttl?flavor=dump seems to work just fine. So I imagine this is some interaction with dump environment, like deduplication... TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev Cc: Stashbot, aude, daniel, Addshore, hoo, Aklapper, Smalyshev, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
aude added a comment. Failed to dump Q5940875 (Bad transition: 5 -> 11) https://www.wikidata.org/wiki/Q5940875 TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: aude Cc: Stashbot, aude, daniel, Addshore, hoo, Aklapper, Smalyshev, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
Stashbot added a comment. Mentioned in SAL [2016-04-29T18:12:55Z]Synchronized php-1.27.0-wmf.22/extensions/Wikidata/extensions/Wikibase/repo/includes/Dumpers/DumpGenerator.php: wmf.22 fc20c54f7915b94ec0d15ef17e207c116910623d 1 of 2 https://phabricator.wikimedia.org/T133924 (duration: 00m 44s) TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Stashbot Cc: Stashbot, aude, daniel, Addshore, hoo, Aklapper, Smalyshev, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
daniel added a comment. Related pull request on github: https://github.com/wmde/purtle/pull/1 TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: daniel Cc: aude, daniel, Addshore, hoo, Aklapper, Smalyshev, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
Smalyshev added a comment. Looking into missing IDs, the stream of breakage starts somewhere around: 5871286 5891261 5898695 5940875 <- 5940884 5940887 5940891 5940903 5940904 5940907 5940908 5940913 After the marked one, almost every ID in this shard is missing. Preceding Q5940875 in the same shard is Q5940871 but I can't find any anomaly in either from here. Must be some kind of interaction with other entities. TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev Cc: aude, daniel, Addshore, hoo, Aklapper, Smalyshev, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
Smalyshev added a comment. @aude the script may fail in the middle of the entity, so the entity ID may be in the data but only part of the data with it. Is there any way to test the script on the real data? @hoo before we create new dump, I think we have to find the bug? Otherwise new dump will be as short as the old one. TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev Cc: aude, daniel, Addshore, hoo, Aklapper, Smalyshev, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
aude added a comment. here is a list of the missing entities: http://dumps.filbertkm.com/wikidata-missing-rdf.txt the dump rdf script can work with such a list and maybe is a way to find if there is a specific entity where the script fails. TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: aude Cc: aude, daniel, Addshore, hoo, Aklapper, Smalyshev, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
hoo added a comment. We should probably create a new dump tomorrow and then delete the old one. If no one objects, I'll do that tomorrow. A new dump will be created on Monday anyway, so that might not be necessary. TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo Cc: aude, daniel, Addshore, hoo, Aklapper, Smalyshev, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
Smalyshev added a comment. Aha! Thanks a lot! That seems to be the cause. I'll look into it, probably should be not hard to fix. TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev Cc: aude, daniel, Addshore, hoo, Aklapper, Smalyshev, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
aude added a comment. @Smalyshev I am looking... In dumpwikidatattl-wikidata-20160418-all-BETA-1.log, we have "Processed 5516718 entities." In dumpwikidatattl-wikidata-20160418-all-BETA-1.log, i see and then i think the script dies: Processed 1378502 entities. Exception encountered, of type "LogicException" [1a1b2025b435dcb79805] [no req] LogicException from line 522 of /srv/mediawiki/php-1.27.0-wmf.21/extensions/Wikidata/extensions/Wikibase/purtle/src/RdfWriterBase.php: Bad transition: 5 -> 11 Backtrace: #0 /srv/mediawiki/php-1.27.0-wmf.21/extensions/Wikidata/extensions/Wikibase/purtle/src/RdfWriterBase.php(399): Wikimedia\Purtle\RdfWriterBase->state(integer) #1 /srv/mediawiki/php-1.27.0-wmf.21/extensions/Wikidata/extensions/Wikibase/purtle/src/RdfWriterBase.php(381): Wikimedia\Purtle\RdfWriterBase->say(string) #2 /srv/mediawiki/php-1.27.0-wmf.21/extensions/Wikidata/extensions/Wikibase/repo/includes/Rdf/Values/ComplexValueRdfHelper.php(92): Wikimedia\Purtle\RdfWriterBase->a(string, string) #3 /srv/mediawiki/php-1.27.0-wmf.21/extensions/Wikidata/extensions/Wikibase/repo/includes/Rdf/Values/QuantityRdfBuilder.php(83): Wikibase\Rdf\Values\ComplexValueRdfHelper->attachValueNode(Wikimedia\Purtle\TurtleRdfWriter, string, string, string, DataValues\QuantityValue) #4 /srv/mediawiki/php-1.27.0-wmf.21/extensions/Wikidata/extensions/Wikibase/repo/includes/Rdf/Values/QuantityRdfBuilder.php(57): Wikibase\Rdf\Values\QuantityRdfBuilder->addValueNode(Wikimedia\Purtle\TurtleRdfWriter, string, string, string, DataValues\QuantityValue) #5 /srv/mediawiki/php-1.27.0-wmf.21/extensions/Wikidata/extensions/Wikibase/repo/includes/Rdf/DispatchingValueSnakRdfBuilder.php(55): Wikibase\Rdf\Values\QuantityRdfBuilder->addValue(Wikimedia\Purtle\TurtleRdfWriter, string, string, string, Wikibase\DataModel\Snak\PropertyValueSnak) #6 /srv/mediawiki/php-1.27.0-wmf.21/extensions/Wikidata/extensions/Wikibase/repo/includes/Rdf/SnakRdfBuilder.php(138): Wikibase\Rdf\DispatchingValueSnakRdfBuilder->addValue(Wikimedia\Purtle\TurtleRdfWriter, string, string, string, Wikibase\DataModel\Snak\PropertyValueSnak) #7 /srv/mediawiki/php-1.27.0-wmf.21/extensions/Wikidata/extensions/Wikibase/repo/includes/Rdf/SnakRdfBuilder.php(93): Wikibase\Rdf\SnakRdfBuilder->addSnakValue(Wikimedia\Purtle\TurtleRdfWriter, Wikibase\DataModel\Snak\PropertyValueSnak, string) #8 /srv/mediawiki/php-1.27.0-wmf.21/extensions/Wikidata/extensions/Wikibase/repo/includes/Rdf/FullStatementRdfBuilder.php(163): Wikibase\Rdf\SnakRdfBuilder->addSnak(Wikimedia\Purtle\TurtleRdfWriter, Wikibase\DataModel\Snak\PropertyValueSnak, string) #9 /srv/mediawiki/php-1.27.0-wmf.21/extensions/Wikidata/extensions/Wikibase/repo/includes/Rdf/FullStatementRdfBuilder.php(143): Wikibase\Rdf\FullStatementRdfBuilder->addStatement(Wikibase\DataModel\Entity\ItemId, Wikibase\DataModel\Statement\Statement, boolean) #10 /srv/mediawiki/php-1.27.0-wmf.21/extensions/Wikidata/extensions/Wikibase/repo/includes/Rdf/FullStatementRdfBuilder.php(235): Wikibase\Rdf\FullStatementRdfBuilder->addStatements(Wikibase\DataModel\Entity\ItemId, Wikibase\DataModel\Statement\StatementList) #11 /srv/mediawiki/php-1.27.0-wmf.21/extensions/Wikidata/extensions/Wikibase/repo/includes/Rdf/RdfBuilder.php(400): Wikibase\Rdf\FullStatementRdfBuilder->addEntity(Wikibase\DataModel\Entity\Item) #12 /srv/mediawiki/php-1.27.0-wmf.21/extensions/Wikidata/extensions/Wikibase/repo/includes/Dumpers/RdfDumpGenerator.php(115): Wikibase\Rdf\RdfBuilder->addEntity(Wikibase\DataModel\Entity\Item) #13 /srv/mediawiki/php-1.27.0-wmf.21/extensions/Wikidata/extensions/Wikibase/repo/includes/Dumpers/DumpGenerator.php(304): Wikibase\Dumpers\RdfDumpGenerator->generateDumpForEntityId(Wikibase\DataModel\Entity\ItemId) #14 /srv/mediawiki/php-1.27.0-wmf.21/extensions/Wikidata/extensions/Wikibase/repo/includes/Dumpers/DumpGenerator.php(275): Wikibase\Dumpers\DumpGenerator->dumpEntities(array, integer) #15 /srv/mediawiki/php-1.27.0-wmf.21/extensions/Wikidata/extensions/Wikibase/repo/maintenance/dumpEntities.php(178): Wikibase\Dumpers\DumpGenerator->generateDump(Wikibase\Repo\Store\SQL\EntityPerPageIdPager) #16 /srv/mediawiki/php-1.27.0-wmf.21/extensions/Wikidata/extensions/Wikibase/repo/maintenance/dumpRdf.php(108): Wikibase\DumpScript->execute() #17 /srv/mediawiki/php-1.27.0-wmf.21/maintenance/doMaintenance.php(103): Wikibase\DumpRdf->execute() #18 /srv/mediawiki/php-1.27.0-wmf.21/extensions/Wikidata/extensions/Wikibase/repo/maintenance/dumpRdf.php(143): include(string) #19 /srv/mediawiki/multiversion/MWScript.php(97): include(string) #20 {main} shards 2 + 3 don't have any errors, but I think the amount missing from shard 1 helps explain this. TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: aude Cc: aude, daniel, Addshore, hoo, Aklapper,
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
aude added a comment. some missing items: https://www.wikidata.org/wiki/Q1026 https://www.wikidata.org/wiki/Q1028 https://www.wikidata.org/wiki/Q1049 https://www.wikidata.org/wiki/Q1068 https://www.wikidata.org/wiki/Q1073 https://www.wikidata.org/wiki/Q1088 https://www.wikidata.org/wiki/Q1097 (think these are all instance of "Wikimedia category", though think that's not relevant as other items in that id range are in the later dump) TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: aude Cc: aude, daniel, Addshore, hoo, Aklapper, Smalyshev, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
Smalyshev added a comment. Looks like 1/5 is missing - one shard? Can we check out the logs on actual dump machine and see maybe there is some error message? TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev Cc: aude, daniel, Addshore, hoo, Aklapper, Smalyshev, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
aude added a comment. I am doing some comparison of the dumps. This should give a count of number of items in the dumps. zcat wikidata-20160411-all-BETA.ttl.gz | grep 'wdata:Q' | sort >> wikidata-20160411.ttl zcat wikidata-20160418-all-BETA.ttl.gz | grep 'wdata:Q' | sort >> wikidata-20160418.ttl wc -l wikidata-20160411.ttl 21956833 wikidata-20160411.ttl wc -l wikidata-20160418.ttl 17915507 wikidata-20160418.ttl TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: aude Cc: aude, daniel, Addshore, hoo, Aklapper, Smalyshev, D3r1ck01, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T133924: Wikidata dump is missing entities
Smalyshev added a comment. Dump from 20160411 does have Q23760660. TASK DETAIL https://phabricator.wikimedia.org/T133924 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev Cc: hoo, Aklapper, Smalyshev, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs