If you don't need to write data back using that library I'd say go for #2. Convert to a scala class and standard lists, should be easier down the line. That being said, you may end up writing custom code if you stick with kryo anyway...
Sent from my iPhone On 05 Oct 2015, at 21:42, Jakub Dubovsky <spark.dubovsky.ja...@seznam.cz<mailto:spark.dubovsky.ja...@seznam.cz>> wrote: Thank you for quick reaction. I have to say this is very surprising to me. I never received an advice to stop using an immutable approach. Whole RDD is designed to be immutable (which is sort of sabotaged by not being able to (de)serialize immutable classes properly). I will ask on dev list if this is to be changed or not. Ok, I have let go initial feelings and now let's be pragmatic. And this is still for everyone not just Igor: I use a class from a library which is immutable. Now I want to use this class to represent my data in RDD because this saves me a huge amount of work. The class uses ImmutableList as one of its fields. That's why it fails. But isn't there a way to workaround this? I ask this because I have exactly zero knowledge about kryo and the way how it works. So for example would some of these two work? 1) Change the external class so that it implements writeObject, readObject methods (it's java). Will these methods be used by kryo? (I can ask the maintainers of a library to change the class if the change is reasonable. Adding these methods would be while dropping immutability certainly wouldn't) 2) Wrap the class to scala class which would translate the data during (de)serialization? Thanks! Jakub Dubovsky ---------- P?vodn? zpr?va ---------- Od: Igor Berman <igor.ber...@gmail.com<mailto:igor.ber...@gmail.com>> Komu: Jakub Dubovsky <spark.dubovsky.ja...@seznam.cz<mailto:spark.dubovsky.ja...@seznam.cz>> Datum: 5. 10. 2015 20:11:35 P?edm?t: Re: RDD of ImmutableList kryo doesn't support guava's collections by default I remember encountered project in github that fixes this(not sure though). I've ended to stop using guava collections as soon as spark rdds are concerned. On 5 October 2015 at 21:04, Jakub Dubovsky <spark.dubovsky.ja...@seznam.cz<mailto:spark.dubovsky.ja...@seznam.cz>> wrote: Hi all, I would like to have an advice on how to use ImmutableList with RDD. Small presentation of an essence of my problem in spark-shell with guava jar added: scala> import com.google.common.collect.ImmutableList import com.google.common.collect.ImmutableList scala> val arr = Array(ImmutableList.of(1,2), ImmutableList.of(2,4), ImmutableList.of(3,6)) arr: Array[com.google.common.collect.ImmutableList[Int]] = Array([1, 2], [2, 4], [3, 6]) scala> val rdd = sc.parallelize(arr) rdd: org.apache.spark.rdd.RDD[com.google.common.collect.ImmutableList[Int]] = ParallelCollectionRDD[0] at parallelize at <console>:24 scala> rdd.count This results in kryo exception saying that it cannot add a new element to list instance while deserialization: java.io.IOException: java.lang.UnsupportedOperationException at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163) at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70) ... at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.UnsupportedOperationException at com.google.common.collect.ImmutableCollection.add(ImmutableCollection.java:91) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18) ... It somehow makes sense. But I cannot think of a workaround and I do not believe that using ImmutableList with RDD is not possible. How this is solved? Thank you in advance! Jakub Dubovsky