Hi, Saif, Could you post your code here? It might help others reproduce the errors and give you a correct answer.
Thanks, Xiao Li 2015-10-22 8:27 GMT-07:00 <saif.a.ell...@wellsfargo.com>: > Hello everyone, > > I am doing some analytics experiments under a 4 server stand-alone cluster > in a spark shell, mostly involving a huge database with groupBy and > aggregations. > > I am picking 6 groupBy columns and returning various aggregated results in > a dataframe. GroupBy fields are of two types, most of them are StringType > and the rest are LongType. > > The data source is a splitted json file dataframe, once the data is > persisted, the result is consistent. But if I unload the memory and reload > the data, the groupBy action returns different content results, missing > data. > > Could I be missing something? this is rather serious for my analytics, and > not sure how to properly diagnose this situation. > > Thanks, > Saif > >