Jason,

I would argue that you should exchange a Set of triples, so you can take
advantage of Spark's distributed nature.  Your logic can materialize that
list into a Graph or Model when needed to operate on it.   Andy is right
about being careful about the size - you may want to build a specialized
set that throws if the set is too large, and you may want to experiment
with it.

Andy,

Does Jena Riot (or contrib) provide a binary syntax for RDF that is optimal
for fast parse?  I'm recalling Michael Stonebraker's response to the
BigTable paper -
https://pdfs.semanticscholar.org/08d1/2e771d811bcd0d4bc81fa3993563efbaeadb.pdf,
and also gSOAP and other binary XML formats.  To this paper, the Google
BigTable authors then responded that they don't use loose serializations
such as provided by HDFS, but instead use structured data.

This is hugely important to Jason's question because this is one of the
benefits of using Spark instead of HDFS - Spark will handle distributing a
huge dataset to multiple systems so that algorithm authors can operate on a
vector (of Jena models?) far too large to fit in one machine.

On Wed, Jun 5, 2019 at 4:40 PM Andy Seaborne <a...@apache.org> wrote:

> Hi Jason,
>
> Models aren't serializable, nor are Graphs (the more system oriented
> view of RDF) through  Triples, Quads and Node are serializable.  You can
> send a list of triples.
>
> Or use an RDF syntax and write-then-read the RDF.
>
> But are the models small? RDF graph aren't always small so moving them
> around may be expensive.
>
>      Andy
>
> On 05/06/2019 17:59, Scarlet Remilia wrote:
> > Hello everyone,
> > I get a problem about Jena and Spark.
> > I use Jena Model to handle some RDF models in my spark executor, but I
> get a error:
> >
> > java.io.NotSerializableException:
> org.apache.jena.rdf.model.impl.ModelCom
> >
> > Serialization stack:
> >          - object not serializable (class:
> org.apache.jena.rdf.model.impl.ModelCom)
> >          - field (class: org.nari.r2rml.entities.Template, name: model,
> type: interface org.apache.jena.rdf.model.Model)
> >          - object (class org.nari.r2rml.entities.Template,
> org.nari.r2rml.entities.Template@23dc70c1)
> >          - field (class: org.nari.r2rml.entities.PredicateObjectMap,
> name: objectTemplate, type: class org.nari.r2rml.entities.Template)
> >          - object (class org.nari.r2rml.entities.PredicateObjectMap,
> org.nari.r2rml.entities.PredicateObjectMap@2de96eba)
> >          - writeObject data (class: java.util.ArrayList)
> >          - object (class java.util.ArrayList,
> [org.nari.r2rml.entities.PredicateObjectMap@2de96eba])
> >          - field (class: org.nari.r2rml.entities.LogicalTableMapping,
> name: predicateObjectMaps, type: class java.util.ArrayList)
> >          - object (class org.nari.r2rml.entities.LogicalTableMapping,
> org.nari.r2rml.entities.LogicalTableMapping@8e00c02)
> >          - field (class: org.nari.r2rml.beans.Impl.EachPartitonFunction,
> name: logicalTableMapping, type: class
> org.nari.r2rml.entities.LogicalTableMapping)
> >          - object (class org.nari.r2rml.beans.Impl.EachPartitonFunction,
> org.nari.r2rml.beans.Impl.EachPartitonFunction@1e14b269)
> >          - field (class:
> org.apache.spark.sql.Dataset$$anonfun$foreachPartition$2, name: func$4,
> type: interface org.apache.spark.api.java.function.ForeachPartitionFunction)
> >          - object (class
> org.apache.spark.sql.Dataset$$anonfun$foreachPartition$2, <function1>)
> >          at
> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
> >          at
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
> >          at
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
> >          at
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:400)
> >          ... 33 more
> >
> > All these classes implement serializable interface.
> > So how could I serialize Jena model java object?
> >
> > Thanks very much!
> >
> >
> > Jason
> >
> > Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for
> Windows 10
> >
> >
>

Reply via email to