Addshore added a comment.

@Addshore After reconsidering this, I have to state openly that I am against relying on JSON dumps as the only source of data.

Why?

@WMDE-leszek @Aleksey_WMDE will also be interested to hear, I guess. Well:

  • Given the context of the contemporary Data Science, and the really powerful infrastructure that we have at our disposal, it simply doesn't seem right to have to process the Wikidata dumps in order to be able to fetch any statistics on the data model or provide (simple or not) aggregates.
  • I think we need a solution that lives properly in the Big Data segment. That most probably means Hadoop. Maybe sqooping the whole wb_terms table - or whatever re-engineered SQL version of it - to Hadoop, and then re-direct any relevant changes that we would previously have recorded in SQL to Hadoop.

I'm not sure if there is priority to get this done.

@Lydia_Pintscher @Addshore @WMDE-leszek @Aleksey_WMDE

Putting aside the question of the afterlife of the SQL wb_terms table for now: is there anything that can be done to fix this Dashboard anytime soon, or we wait for a new data engineering solution for the labels first?

The way to get this dashboard fixed in the short term would be to use the dumps.
Another alternate is to wait for the changes to wb_terms, but that could take a while.

@WMDE-leszek recently altered the dump script to count the number of labels in total, this would be trivial to split per language for labels and also include other types of terms.
https://gerrit.wikimedia.org/r/#/c/analytics/wmde/toolkit-analyzer/+/440133/2/analyzer/src/main/java/org/wikidata/analyzer/Processor/MetricProcessor.java


TASK DETAIL
https://phabricator.wikimedia.org/T154601

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, Addshore
Cc: WMDE-leszek, Aleksey_WMDE, Ivanhercaz, VIGNERON, Lydia_Pintscher, GoranSMilovanovic, gerritbot, Addshore, Sjoerddebruin, Aklapper, Lahi, Gq86, QZanden, LawExplorer, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to