[ 
https://issues.apache.org/jira/browse/SPARK-720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207732#comment-14207732
 ] 

Andrew Ash commented on SPARK-720:
----------------------------------

A big design goal of Spark is that you don't have to have any type restrictions 
on the objects contained in an RDD.  If the parametrized type of the RDD 
happens to implement Comparable then the shuffler can make optimizations, but 
it's not a requirement.  It's also not a requirement that the type implements 
Serializable, because you could be using Kryo to serialize and transport an 
object out of the Spark user's control that doesn't have the Serializable 
marker interface.  I think it would be a hard sell to add restrictions to the 
type parametrized type of an RDD.

Getting over that, I think the intention of guaranteeing serialization via the 
type system could work for standard JVM serialization (Serializable and 
Externalizable interfaces) because a class's serializability is clearly marked 
with those interfaces already.  But I'm concerned that it couldn't be made to 
work with other serializer systems such as Kryo where there is no convenient 
marker interface.

Kryo does serialization by registering a serializer for each class, and using 
that Class->Serializer map for future serialization by reflectively looking at 
an object's type as it receives objects for serialization.  The only way to 
know if a class is serializable is to know what classes a Kryo instance has 
registered at compile time, which I believe is impossible given that the 
Registrator comes from outside the Spark codebase.

[~emchristiansen] do you see a way to implement this generically for JVM 
serialization + Kryo + other systems in the future?  I think we may have to 
close this request for infeasibility.

> Statically guarantee serialization will succeed
> -----------------------------------------------
>
>                 Key: SPARK-720
>                 URL: https://issues.apache.org/jira/browse/SPARK-720
>             Project: Spark
>          Issue Type: Improvement
>    Affects Versions: 0.7.1
>            Reporter: Eric Christiansen
>
> First, thanks for developing Spark. It's great.
> Maybe I'm trying to serialize weird objects (eg Shapeless constructs), but I 
> tend to get quite a few NotSerializableExceptions. These are pretty annoying 
> because they happen at runtime, lengthening my code/debug cycle. 
> I'd like it if Spark could introduce a serialization system that could 
> statically check that serialization will succeed. One approach is to use 
> typeclasses, perhaps using Spray-Json as inspiration. An added benefit of 
> typeclasses is they can be used to serialize objects that were not originally 
> intended to be serialized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to