Hi,
  I am trying to see what is the best way to reduce the values of a RDD of 
(key,value) pairs into (key,ListOfValues) pair. I know various ways of 
achieving this, but I am looking for a efficient, elegant one-liner if there is 
one. 

Example:
Input RDD: (USA, California), (UK, Yorkshire), (USA, Colorado)
Output RDD: (USA, [California, Colorado]), (UK, Yorkshire)

Is it possible to use reduceByKey or foldByKey to achieve this, instead of 
groupBykey. 

Something equivalent to a cons operator from LISP?, so that I could just say 
reduceBykey(lambda x,y:  (cons x y) ). May be it is more a python question than 
a spark question of how to create a list from 2 elements without a starting 
empty list?

Thanks,
Kannappan
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to