dcausse created this task.
dcausse added projects: Wikidata, Wikidata-Query-Service, Data-Platform-SRE, 
Discovery-Search (Current work).

TASK DESCRIPTION
  The table `wikibase_rdf` contains 4 columns (not counting partition columns):
  
  - context
  - subject
  - preficate
  - object
  
  We should write a job that can converts a given partition into a format that 
is readable by an RDF compliant application (blazegraph must support this 
format). The formats used in our infracture are generally Turtle 
<https://www.w3.org/TR/turtle/> and n3 
<https://www.w3.org/DesignIssues/Notation3.html> (more formats esp. faster 
binary ones can be evaluated but this is out of scope of this task).
  
  The output does not have to keep the same ordering as the original RDF output 
from wikibase but we might consider keeping the triples attached to an entity 
grouped together (sort by context).
  Ideally we want this format to be extracted as plain file, this task does not 
imply that the tooling is able to do so but some documentation must be added to 
define a procedure using existing hdfs tools to extract the file content.
  
  AC:
  
  - a spark job is available and can take a triples table, the desired output 
format, (optional: the desired chunk size) the location of the output
  - documentation on how to extract the RDF chunk files out of hdfs

TASK DETAIL
  https://phabricator.wikimedia.org/T350106

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Aklapper, BTullis, bking, dr0ptp4kt, JAllemandou, dcausse, 
Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, 
maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to