Hi, I am trying to see what is the best way to reduce the values of a RDD of (key,value) pairs into (key,ListOfValues) pair. I know various ways of achieving this, but I am looking for a efficient, elegant one-liner if there is one.
Example: Input RDD: (USA, California), (UK, Yorkshire), (USA, Colorado) Output RDD: (USA, [California, Colorado]), (UK, Yorkshire) Is it possible to use reduceByKey or foldByKey to achieve this, instead of groupBykey. Something equivalent to a cons operator from LISP?, so that I could just say reduceBykey(lambda x,y: (cons x y) ). May be it is more a python question than a spark question of how to create a list from 2 elements without a starting empty list? Thanks, Kannappan --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org