Could you give more details of your code?
On Wed, Mar 22, 2017 at 2:40 AM, Shashank Mandil <mandil.shash...@gmail.com> wrote: > Hi All, > > I have a spark data frame which has 992 rows inside it. > When I run a map on this data frame I expect that the map should work for > all the 992 rows. > > As a mapper runs on an executor on a cluster I did a distributed count of > the number of rows the mapper is being run on. > > dataframe.map(r => { > //distributed count inside here using zookeeper > }) > > I have found that this distributed count inside the mapper is not exactly > 992. I have found this number to vary with different runs. > > Does anybody have any idea what might be happening ? By the way, I am > using spark 1.6.1 > > Thanks, > Shashank > >