GoranSMilovanovic added subscribers: Aleksey_WMDE, WMDE-leszek.
GoranSMilovanovic added a comment.

@Addshore After reconsidering this, I have to state openly that I am against relying on JSON dumps as the only source of data.

@WMDE-leszek @Aleksey_WMDE will also be interested to hear, I guess. Well:

  • Given the context of the contemporary Data Science, and the really powerful infrastructure that we have at our disposal, it simply doesn't seem right to have to process the Wikidata dumps in order to be able to fetch any statistics on the data model or provide (simple or not) aggregates.
  • I think we need a solution that lives properly in the Big Data segment. That most probably means Hadoop. Maybe sqooping the whole wb_terms table - or whatever re-engineered SQL version of it - to Hadoop, and then re-direct any relevant changes that we would previously have recorded in SQL to Hadoop.

TASK DETAIL
https://phabricator.wikimedia.org/T154601

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: WMDE-leszek, Aleksey_WMDE, Ivanhercaz, VIGNERON, Lydia_Pintscher, GoranSMilovanovic, gerritbot, Addshore, Sjoerddebruin, Aklapper, Lahi, Gq86, QZanden, LawExplorer, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to