On Fri, Dec 5, 2014 at 7:12 AM, Tobias Pfeiffer <t...@preferred.jp> wrote:
> Rahul, > > On Fri, Dec 5, 2014 at 2:50 PM, Rahul Bindlish < > rahul.bindl...@nectechnologies.in> wrote: >> >> I have done so thats why spark is able to load objectfile [e.g. >> person_obj] >> and spark has maintained serialVersionUID [person_obj]. >> >> Next time when I am trying to load another objectfile [e.g. office_obj] >> and >> I think spark is matching serialVersionUID [person_obj] with previous >> serialVersionUID [person_obj] and giving mismatch error. >> >> In my first post, I have give statements which can be executed easily to >> replicate this issue. >> > > Can you post the Scala source for your case classes? I have tried the > following in spark-shell: > > case class Dog(name: String) > case class Cat(age: Int) > val dogs = sc.parallelize(Dog("foo") :: Dog("bar") :: Nil) > val cats = sc.parallelize(Cat(1) :: Cat(2) :: Nil) > dogs.saveAsObjectFile("test_dogs") > cats.saveAsObjectFile("test_cats") > > This gives two directories "test_dogs/" and "test_cats/". Then I restarted > spark-shell and entered: > > case class Dog(name: String) > case class Cat(age: Int) > val dogs = sc.objectFile("test_dogs") > val cats = sc.objectFile("test_cats") > > I don't get an exception, but: > > dogs: org.apache.spark.rdd.RDD[Nothing] = FlatMappedRDD[1] at objectFile > at <console>:12 > You need to specify the type of the RDD. The compiler does not know what is in "test_dogs". val dogs = sc.objectFile[Dog]("test_dogs") val cats = sc.objectFile[Cat]("test_cats") It's an easy mistake to make... I wonder if an assertion could be implemented that makes sure the type parameter is present.