Re: Spark groupby and agg inconsistent and missing data

Xiao Li Thu, 22 Oct 2015 08:46:22 -0700

Hi, Saif,

Could you post your code here? It might help others reproduce the errors
and give you a correct answer.


Thanks,

Xiao Li

2015-10-22 8:27 GMT-07:00 <saif.a.ell...@wellsfargo.com>:

> Hello everyone,
>
> I am doing some analytics experiments under a 4 server stand-alone cluster
> in a spark shell, mostly involving a huge database with groupBy and
> aggregations.
>
> I am picking 6 groupBy columns and returning various aggregated results in
> a dataframe. GroupBy fields are of two types, most of them are StringType
> and the rest are LongType.
>
> The data source is a splitted json file dataframe,  once the data is
> persisted, the result is consistent. But if I unload the memory and reload
> the data, the groupBy action returns different content results, missing
> data.
>
> Could I be missing something? this is rather serious for my analytics, and
> not sure how to properly diagnose this situation.
>
> Thanks,
> Saif
>
>

Re: Spark groupby and agg inconsistent and missing data

Reply via email to