Re: Spark groupby and agg inconsistent and missing data

2015-12-10 Thread Kapil Raaj
es of “vine” (which is StringType) both from > data and res2, and res2 is missing a lot of values: > > > > val t1 = res2.select("vine").distinct.collect > > scala> t1.size > > res10: Int = 617 > > > > val t_real = data.select("vine").di

RE: Spark groupby and agg inconsistent and missing data

2015-10-22 Thread Saif.A.Ellafi
nevermind my last email. res2 is filtered so my test does not make sense. The issue is not reproduced there. I have the problem somwhere else. From: Ellafi, Saif A. Sent: Thursday, October 22, 2015 12:57 PM To: 'Xiao Li' Cc: user Subject: RE: Spark groupby and agg inconsistent and mi

RE: Spark groupby and agg inconsistent and missing data

2015-10-22 Thread Saif.A.Ellafi
lect scala> t_real.size res9: Int = 639 From: Xiao Li [mailto:gatorsm...@gmail.com] Sent: Thursday, October 22, 2015 12:45 PM To: Ellafi, Saif A. Cc: user Subject: Re: Spark groupby and agg inconsistent and missing data Hi, Saif, Could you post your code here? It might help others reproduce the

Re: Spark groupby and agg inconsistent and missing data

2015-10-22 Thread Xiao Li
Hi, Saif, Could you post your code here? It might help others reproduce the errors and give you a correct answer. Thanks, Xiao Li 2015-10-22 8:27 GMT-07:00 : > Hello everyone, > > I am doing some analytics experiments under a 4 server stand-alone cluster > in a spark shell, mostly involving a

Spark groupby and agg inconsistent and missing data

2015-10-22 Thread Saif.A.Ellafi
Hello everyone, I am doing some analytics experiments under a 4 server stand-alone cluster in a spark shell, mostly involving a huge database with groupBy and aggregations. I am picking 6 groupBy columns and returning various aggregated results in a dataframe. GroupBy fields are of two types, m