Re: DataFrame --- join / groupBy-agg question...

2017-07-19 Thread qihuagao
also interested in this. Is the partition count of df depending on fields of groupby? Also is the performance of groupby-agg comparable to reducebykey/aggbykey? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DataFrame-join-groupBy-agg-question-tp28849p2887

about aggregateByKey of pairrdd.

2017-07-19 Thread qihuagao
java pair rdd has aggregateByKey, which can avoid full shuffle, so have impressive performance. which has parameters, The aggregateByKey function requires 3 parameters: # An intitial ‘zero’ value that will not effect the total values to be collected # A combining function accepting two paremeters.