a
mapPartitions or one of the other combineByKey APIs?
From: Jianguo Li
Date: Tuesday, June 23, 2015 at 9:46 AM
To: Silvio Fiorito
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: workaround for groupByKey
Thanks. Yes, unfortunately, they all need to be grouped. I guess I can
?
From: Jianguo Li
Date: Monday, June 22, 2015 at 6:21 PM
To: Silvio Fiorito
Cc: user@spark.apache.org
Subject: Re: workaround for groupByKey
Thanks for your suggestion. I guess aggregateByKey is similar to
combineByKey. I read in the Learning Sparking
*We can disable map-side aggregation
Hi,
I am processing an RDD of key-value pairs. The key is an user_id, and the
value is an website url the user has ever visited.
Since I need to know all the urls each user has visited, I am tempted to
call the groupByKey on this RDD. However, since there could be millions of
users and urls,
There is reduceByKey that works on K,V. You need to accumulate partial
results and proceed. does your computation allow that ?
On Mon, Jun 22, 2015 at 2:12 PM, Jianguo Li flyingfromch...@gmail.com
wrote:
Hi,
I am processing an RDD of key-value pairs. The key is an user_id, and the
value is
test = input.aggregateByKey(ListBuffer.empty[String])((a, b) = a +=
b, (a, b) = a ++ b)
From: Jianguo Li
Date: Monday, June 22, 2015 at 5:12 PM
To: user@spark.apache.org
Subject: workaround for groupByKey
Hi,
I am processing an RDD of key-value pairs. The key is an user_id
perhaps?
From: Jianguo Li
Date: Monday, June 22, 2015 at 6:21 PM
To: Silvio Fiorito
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: workaround for groupByKey
Thanks for your suggestion. I guess aggregateByKey is similar to combineByKey.
I read in the Learning Sparking
We can