Re: Getting around Serializability issues for types not in my control

Cody Koeninger Mon, 23 Mar 2015 13:11:47 -0700

Have you tried instantiating the instance inside the closure, rather than
outside of it?


If that works, you may need to switch to use mapPartition /
foreachPartition for efficiency reasons.


On Mon, Mar 23, 2015 at 3:03 PM, Adelbert Chang <adelbe...@gmail.com> wrote:

> Is there no way to pull out the bits of the instance I want before I sent
> it through the closure for aggregate? I did try pulling things out, along
> the lines of
>
> def foo[G[_], B](blah: Blah)(implicit G: Applicative[G]) = {
>   val lift: B => G[RDD[B]] = b =>
> G.point(sparkContext.parallelize(List(b)))
>
>   rdd.aggregate(/* use lift in here */)
> }
>
> But that doesn't seem to work either, still seems to be trying to
> serialize the Applicative... :(
>
> On Mon, Mar 23, 2015 at 12:27 PM, Dean Wampler <deanwamp...@gmail.com>
> wrote:
>
>> Well, it's complaining about trait OptionInstances which is defined in
>> Option.scala in the std package. Use scalap or javap on the scalaz library
>> to find out which member of the trait is the problem, but since it says
>> "$$anon$1", I suspect it's the first value member, "implicit val
>> optionInstance", which has a long list of mixin traits, one of which is
>> probably at fault. OptionInstances is huge, so there might be other
>> offenders.
>>
>> Scalaz wasn't designed for distributed systems like this, so you'll
>> probably find many examples of nonserializability. An alternative is to
>> avoid using Scalaz in any closures passed to Spark methods, but that's
>> probably not what you want.
>>
>> dean
>>
>> Dean Wampler, Ph.D.
>> Author: Programming Scala, 2nd Edition
>> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
>> Typesafe <http://typesafe.com>
>> @deanwampler <http://twitter.com/deanwampler>
>> http://polyglotprogramming.com
>>
>> On Mon, Mar 23, 2015 at 12:03 PM, adelbertc <adelbe...@gmail.com> wrote:
>>
>>> Hey all,
>>>
>>> I'd like to use the Scalaz library in some of my Spark jobs, but am
>>> running
>>> into issues where some stuff I use from Scalaz is not serializable. For
>>> instance, in Scalaz there is a trait
>>>
>>> /** In Scalaz */
>>> trait Applicative[F[_]] {
>>>   def apply2[A, B, C](fa: F[A], fb: F[B])(f: (A, B) => C): F[C]
>>>   def point[A](a: => A): F[A]
>>> }
>>>
>>> But when I try to use it in say, in an `RDD#aggregate` call I get:
>>>
>>>
>>> Caused by: java.io.NotSerializableException:
>>> scalaz.std.OptionInstances$$anon$1
>>> Serialization stack:
>>>         - object not serializable (class:
>>> scalaz.std.OptionInstances$$anon$1,
>>> value: scalaz.std.OptionInstances$$anon$1@4516ee8c)
>>>         - field (class: dielectric.syntax.RDDOps$$anonfun$1, name: G$1,
>>> type:
>>> interface scalaz.Applicative)
>>>         - object (class dielectric.syntax.RDDOps$$anonfun$1, <function2>)
>>>         - field (class:
>>> dielectric.syntax.RDDOps$$anonfun$traverse$extension$1,
>>> name: apConcat$1, type: interface scala.Function2)
>>>         - object (class
>>> dielectric.syntax.RDDOps$$anonfun$traverse$extension$1,
>>> <function2>)
>>>
>>> Outside of submitting a PR to Scalaz to make things Serializable, what
>>> can I
>>> do to make things Serializable? I considered something like
>>>
>>> implicit def applicativeSerializable[F[_]](implicit F: Applicative[F]):
>>> SomeSerializableType[F] =
>>>   new SomeSerializableType { ... } ??
>>>
>>> Not sure how to go about doing it - I looked at java.io.Externalizable
>>> but
>>> given `scalaz.Applicative` has no value members I'm not sure how to
>>> implement the interface.
>>>
>>> Any guidance would be much appreciated - thanks!
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Getting-around-Serializability-issues-for-types-not-in-my-control-tp22193.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>
>
> --
> Adelbert (Allen) Chang
>

Re: Getting around Serializability issues for types not in my control

Reply via email to