dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.
Restricted Application added a project: Wikidata.

TASK DESCRIPTION
  As a maintainer of wdqs I would like to run extract metrics about the shape 
of the wikidata RDF graph. Blazegraph is not designed to extract metrics of 
this kind and would require increasing the timeout too much. Spark is more 
appropriate for this kind of load and thus it would be necessary to have the 
RDF graph stored in a simple hive table with 4 fields:
  
  - context: the item, the shared ref or the shared value
  - subject,
  - predicate
  - object
  
  The value should be stored as string following the NTriples specs
  
  AC:
  
  - the wikidata ttl are munged and imported to hive on a weekly basis
  - the hive table is cleaned up weekly so that only at most 4 versions of the 
graph are stored in this table
  - airflow should be used to orchestrate these jobs

TASK DETAIL
  https://phabricator.wikimedia.org/T259115

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, CBogen, Akuckartz, darthmon_wmde, Nandana, Namenlos314, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to