dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata.
TASK DESCRIPTION As a maintainer of wdqs I would like to run extract metrics about the shape of the wikidata RDF graph. Blazegraph is not designed to extract metrics of this kind and would require increasing the timeout too much. Spark is more appropriate for this kind of load and thus it would be necessary to have the RDF graph stored in a simple hive table with 4 fields: - context: the item, the shared ref or the shared value - subject, - predicate - object The value should be stored as string following the NTriples specs AC: - the wikidata ttl are munged and imported to hive on a weekly basis - the hive table is cleaned up weekly so that only at most 4 versions of the graph are stored in this table - airflow should be used to orchestrate these jobs TASK DETAIL https://phabricator.wikimedia.org/T259115 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, CBogen, Akuckartz, darthmon_wmde, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs