Itsuki Toyota created JENA-1233: ----------------------------------- Summary: Make RDF primitives Serializable Key: JENA-1233 URL: https://issues.apache.org/jira/browse/JENA-1233 Project: Apache Jena Issue Type: Improvement Components: Elephas Affects Versions: Jena 3.1.0 Reporter: Itsuki Toyota
I always use Jena when I handle RDF data with Apache Spark. However, when I want to store resulting RDD data (ex. RDD[Triple]) in binary format, I can't call RDD.saveAsObjectFile method. It's because RDD.saveAsObjectFile requires java.io.Serializable interface. See the following code. https://github.com/apache/spark/blob/v1.6.0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L1469 https://github.com/apache/spark/blob/v1.6.0/core/src/main/scala/org/apache/spark/util/Utils.scala#L79-L86 You can see that 1) RDD.saveAsObjectFile calls Util.serialize method 2) Util.serialize method requires the RDD-wrapped object implementing java.io.Serializable interface. For example, if you want to save a RDD[Triple] object, Triple must implements java.io.Serializable. So why not implement java.io.Serializable ? I think it will improve the usability in Apache Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)