mkroetzsch added a comment.

@JanZerebecki:

Re using the same code: That's not essential here. All we want is that the 
dumps are the same. It's also not necessary to develop the code twice, since it 
is already there twice anyway. It's just the question if we want to use a slow 
method that keeps people waiting for the dumps for days (as they already do now 
with many other dumps), or a fast one that you can run anywhere (even without 
DB access; on a laptop if you like). The fact that we must have the code in PHP 
too makes it possible to go back to the slow system if it should ever be 
needed, so there is no lock-in. Dump file generation is also not 
operation-critical for Wikidata (the internal SPARQL query will likely be based 
on a live feed, not on dumps). What's not to like?

Re consistency: I meant that the dumps would contain the same information, not 
that they reflect a consistent state of the site. If it is important for you to 
have a defined state, then the dump-based file generation is also your friend: 
one can do the same with the full history dump, where one could exactly specify 
the revision to dump. Probably still as fast as the DB method, but guaranteed 
to provide a globally consistent snapshot (yes, I know, modulo deletions). Not 
sure if this type of consistency is relevant though. Having a guarantee that 
the dump files in various formats are based on the same data, however, would be 
quite useful (e.g., in SPARQL, where you often mix data from truthy and full 
dumps in one query).

Recall that we are discussing this here since Lydia said that the slowness of 
the DB-based exports would be a reason for why we cannot have an (otherwise 
convenient) date-based directory structure. I agree with Lydia that this would 
be a blocker, but in this case it's really one that we can easily remove. The 
code I am talking about is at https://github.com/Wikidata/Wikidata-Toolkit, 
well tested, extensively documented, and partially WMF-funded. Why not make 
this into a community engagement success story? :-)


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, mkroetzsch
Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, 
Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb



_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to