Hi!

I'm new to Spark and trying to write my first spark job on some data I have.
The data is in this (parquet) format:

Code,timestamp, value
A, 2017-01-01, 123
A, 2017-01-02, 124
A, 2017-01-03, 126
B, 2017-01-01, 127
B, 2017-01-02, 126
B, 2017-01-03, 123

I want to write a little map-reduce application that must be run on each
'code'.
So I would need to group the data on the 'code' column and than execute the
map and the reduce steps on each code; 2 times in this example, A and B.

But when I group the data (groupBy-function), it returns a
RelationalDatasetGroup. On this I cannot apply the map and reduce function.

I have the feeling that I am running in the wrong direction. Does anyone
know how to approach this? (I hope I explained it right, so it can be
understand :))

Regards,
Marco

Reply via email to