Looks like a spark bug. I can reproduce it for sequence file, but it works for text file.
On Wed, Mar 23, 2016 at 10:56 AM, Thamme Gowda N. <tgow...@gmail.com> wrote: > Hi spark experts, > > I am facing issues with cached RDDs. I noticed that few entries > get duplicated for n times when the RDD is cached. > > I asked a question on Stackoverflow with my code snippet to reproduce it. > > I really appreciate if you can visit > http://stackoverflow.com/q/36168827/1506477 > and answer my question / give your comments. > > Or at the least confirm that it is a bug. > > Thanks in advance for your help! > > -- > Thamme > -- Best Regards Jeff Zhang