On 06/06/2019 02:12, Dan Davis wrote:
Jason,
I would argue that you should exchange a Set of triples, so you can take
advantage of Spark's distributed nature. Your logic can materialize that
list into a Graph or Model when needed to operate on it. Andy is right
about being careful about the size - you may want to build a specialized
set that throws if the set is too large, and you may want to experiment
with it.
Andy,
Does Jena Riot (or contrib) provide a binary syntax for RDF that is optimal
for fast parse?
https://jena.apache.org/documentation/io/rdf-binary.html
It's about x2 faster than N-triples to parse, and about the same time to
write.
Andy