mkroetzsch added a comment. @JanZerebecki:
Re using the same code: That's not essential here. All we want is that the dumps are the same. It's also not necessary to develop the code twice, since it is already there twice anyway. It's just the question if we want to use a slow method that keeps people waiting for the dumps for days (as they already do now with many other dumps), or a fast one that you can run anywhere (even without DB access; on a laptop if you like). The fact that we must have the code in PHP too makes it possible to go back to the slow system if it should ever be needed, so there is no lock-in. Dump file generation is also not operation-critical for Wikidata (the internal SPARQL query will likely be based on a live feed, not on dumps). What's not to like? Re consistency: I meant that the dumps would contain the same information, not that they reflect a consistent state of the site. If it is important for you to have a defined state, then the dump-based file generation is also your friend: one can do the same with the full history dump, where one could exactly specify the revision to dump. Probably still as fast as the DB method, but guaranteed to provide a globally consistent snapshot (yes, I know, modulo deletions). Not sure if this type of consistency is relevant though. Having a guarantee that the dump files in various formats are based on the same data, however, would be quite useful (e.g., in SPARQL, where you often mix data from truthy and full dumps in one query). Recall that we are discussing this here since Lydia said that the slowness of the DB-based exports would be a reason for why we cannot have an (otherwise convenient) date-based directory structure. I agree with Lydia that this would be a blocker, but in this case it's really one that we can easily remove. The code I am talking about is at https://github.com/Wikidata/Wikidata-Toolkit, well tested, extensively documented, and partially WMF-funded. Why not make this into a community engagement success story? :-) TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, mkroetzsch Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb _______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs