If you don't need to write data back using that library I'd say go for #2. 
Convert to a scala class and standard lists, should be easier down the line. 
That being said, you may end up writing custom code if you stick with kryo 
anyway...

Sent from my iPhone

On 05 Oct 2015, at 21:42, Jakub Dubovsky 
<spark.dubovsky.ja...@seznam.cz<mailto:spark.dubovsky.ja...@seznam.cz>> wrote:

Thank you for quick reaction.

I have to say this is very surprising to me. I never received an advice to stop 
using an immutable approach. Whole RDD is designed to be immutable (which is 
sort of sabotaged by not being able to (de)serialize immutable classes 
properly). I will ask on dev list if this is to be changed or not.

Ok, I have let go initial feelings and now let's be pragmatic. And this is 
still for everyone not just Igor:

I use a class from a library which is immutable. Now I want to use this class 
to represent my data in RDD because this saves me a huge amount of work. The 
class uses ImmutableList as one of its fields. That's why it fails. But isn't 
there a way to workaround this? I ask this because I have exactly zero 
knowledge about kryo and the way how it works. So for example would some of 
these two work?

1) Change the external class so that it implements writeObject, readObject 
methods (it's java). Will these methods be used by kryo? (I can ask the 
maintainers of a library to change the class if the change is reasonable. 
Adding these methods would be while dropping immutability certainly wouldn't)

2) Wrap the class to scala class which would translate the data during 
(de)serialization?

  Thanks!
  Jakub Dubovsky


---------- P?vodn? zpr?va ----------
Od: Igor Berman <igor.ber...@gmail.com<mailto:igor.ber...@gmail.com>>
Komu: Jakub Dubovsky 
<spark.dubovsky.ja...@seznam.cz<mailto:spark.dubovsky.ja...@seznam.cz>>
Datum: 5. 10. 2015 20:11:35
P?edm?t: Re: RDD of ImmutableList

kryo doesn't support guava's collections by default
I remember encountered project in github that fixes this(not sure though). I've 
ended to stop using guava collections as soon as spark rdds are concerned.

On 5 October 2015 at 21:04, Jakub Dubovsky 
<spark.dubovsky.ja...@seznam.cz<mailto:spark.dubovsky.ja...@seznam.cz>> wrote:
Hi all,

  I would like to have an advice on how to use ImmutableList with RDD. Small 
presentation of an essence of my problem in spark-shell with guava jar added:

scala> import com.google.common.collect.ImmutableList
import com.google.common.collect.ImmutableList

scala> val arr = Array(ImmutableList.of(1,2), ImmutableList.of(2,4), 
ImmutableList.of(3,6))
arr: Array[com.google.common.collect.ImmutableList[Int]] = Array([1, 2], [2, 
4], [3, 6])

scala> val rdd = sc.parallelize(arr)
rdd: org.apache.spark.rdd.RDD[com.google.common.collect.ImmutableList[Int]] = 
ParallelCollectionRDD[0] at parallelize at <console>:24

scala> rdd.count

 This results in kryo exception saying that it cannot add a new element to list 
instance while deserialization:

java.io.IOException: java.lang.UnsupportedOperationException
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)
        at 
org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70)
        ...
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.UnsupportedOperationException
        at 
com.google.common.collect.ImmutableCollection.add(ImmutableCollection.java:91)
        at 
com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109)
        at 
com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
        ...

  It somehow makes sense. But I cannot think of a workaround and I do not 
believe that using ImmutableList with RDD is not possible. How this is solved?

  Thank you in advance!

   Jakub Dubovsky


Reply via email to