I ended up with the following:
def firstValue(items: Iterable[String]) = for { i <- items
} yield (i, items.head)
data.groupByKey().map{case(a, b)=>(a, firstValue(b))}.collect
More details:
http://dmtolpeko.com/2015/02/17/first_value-last_value-lead-and-lag-in-spark/
I would appreciate any feed
Hello,
To convert existing Map Reduce jobs to Spark, I need to implement window
functions such as FIRST_VALUE, LEAD, LAG and so on. For example,
FIRST_VALUE function:
Source (1st column is key):
A, A1
A, A2
A, A3
B, B1
B, B2
C, C1
and the result should be
A, A1, A1
A, A2, A1
A, A3, A1
B, B1, B