dcausse created this task. dcausse added projects: Wikidata, Wikidata-Query-Service, Data-Platform-SRE, Discovery-Search (Current work).
TASK DESCRIPTION The table `wikibase_rdf` contains 4 columns (not counting partition columns): - context - subject - preficate - object We should write a job that can converts a given partition into a format that is readable by an RDF compliant application (blazegraph must support this format). The formats used in our infracture are generally Turtle <https://www.w3.org/TR/turtle/> and n3 <https://www.w3.org/DesignIssues/Notation3.html> (more formats esp. faster binary ones can be evaluated but this is out of scope of this task). The output does not have to keep the same ordering as the original RDF output from wikibase but we might consider keeping the triples attached to an entity grouped together (sort by context). Ideally we want this format to be extracted as plain file, this task does not imply that the tooling is able to do so but some documentation must be added to define a procedure using existing hdfs tools to extract the file content. AC: - a spark job is available and can take a triples table, the desired output format, (optional: the desired chunk size) the location of the output - documentation on how to extract the RDF chunk files out of hdfs TASK DETAIL https://phabricator.wikimedia.org/T350106 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, BTullis, bking, dr0ptp4kt, JAllemandou, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org