reduceByKey - add values to a list

Kannappan Sirchabesan Thu, 25 Jun 2015 15:38:44 -0700

Hi,
  I am trying to see what is the best way to reduce the values of a RDD of 
(key,value) pairs into (key,ListOfValues) pair. I know various ways of 
achieving this, but I am looking for a efficient, elegant one-liner if there is 
one.


Example:
Input RDD: (USA, California), (UK, Yorkshire), (USA, Colorado)
Output RDD: (USA, [California, Colorado]), (UK, Yorkshire)

Is it possible to use reduceByKey or foldByKey to achieve this, instead of 
groupBykey. 

Something equivalent to a cons operator from LISP?, so that I could just say 
reduceBykey(lambda x,y:  (cons x y) ). May be it is more a python question than 
a spark question of how to create a list from 2 elements without a starting 
empty list?

Thanks,
Kannappan
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

reduceByKey - add values to a list

Reply via email to