I see. Thank you!
On Wed, Sep 11, 2013 at 7:12 PM, Matei Zaharia <[email protected]>wrote: > Hi Wenlei, > > This was actually semi-intentional because we wanted a forward-compatible > format across Spark versions. I'm not sure whether that was a good idea > (and we didn't promise it will be compatible), so later we can change it. > But for now, if you'd like to use Kryo, I recommend implementing the same > thing that saveAsObjectFile does by hand, on top of Hadoop SequenceFiles. > This is all it does: > > /** > * Save this RDD as a SequenceFile of serialized objects. > */ > def saveAsObjectFile(path: String) { > this.mapPartitions(iter => iter.grouped(10).map(_.toArray)) > .map(x => (NullWritable.get(), new > BytesWritable(Utils.serialize(x)))) > .saveAsSequenceFile(path) > } > > With Kryo you might want to use mapPartitions instead of that second map > in order to reuse the serializer across records. This code up here uses > arrays to reduce the cost of writing the Java class information for each > record. > > Matei > > On Sep 10, 2013, at 10:09 PM, Wenlei Xie <[email protected]> wrote: > > > Hi, > > > > It looks like RDD.SaveAsObject will always use Java Serialization , even > I set property spark.serializer as spark.KryoSerializer ? > > > > Thank you! > > > > Best, > > Wenlei > > -- Wenlei Xie (谢文磊) Department of Computer Science 5132 Upson Hall, Cornell University Ithaca, NY 14853, USA Phone: (607) 255-5577 Email: [email protected]
