Can you reveal what is done inside the map() ? Which Spark release are you using ?
Cheers On Fri, Feb 26, 2016 at 7:41 PM, Yan Yang <y...@wealthfront.com> wrote: > Hi > > I am pretty new to Spark, and after experimentation on our pipelines. I > ran into this weird issue. > > The Scala code is as below: > > val input = sc.newAPIHadoopRDD(...) > val rdd = input.map(...) > rdd.cache() > rdd.saveAsTextFile(...) > > I found rdd to consist of 80+K identical rows. To be more precise, the > number of rows is right, but all are identical. > > The truly weird part is if I remove rdd.cache(), everything works just > fine. I have encountered this issue on a few occasions. > > Thanks > Yan > > > > >