Hi! I'm new to Spark and trying to write my first spark job on some data I have. The data is in this (parquet) format:
Code,timestamp, value A, 2017-01-01, 123 A, 2017-01-02, 124 A, 2017-01-03, 126 B, 2017-01-01, 127 B, 2017-01-02, 126 B, 2017-01-03, 123 I want to write a little map-reduce application that must be run on each 'code'. So I would need to group the data on the 'code' column and than execute the map and the reduce steps on each code; 2 times in this example, A and B. But when I group the data (groupBy-function), it returns a RelationalDatasetGroup. On this I cannot apply the map and reduce function. I have the feeling that I am running in the wrong direction. Does anyone know how to approach this? (I hope I explained it right, so it can be understand :)) Regards, Marco