Given the following time series data:
name, time, value
x,2,9
x,1,3
x,3,6
y,2,5
y,1,7
y,3,1
z,3,7
z,4,0
z,1,4
z,2,8
we want to generate the following (the reduced/grouped values are sorted by
time).
x => [(1,3), (2,9), (3,6)]
y => [(1,7), (2,5), (3,1)]
z => [(1,4), (2,8), (3,7), (4,0)]
One obvious way to sort the value by time is that use Java's collection sort
(to sort in memory).
How can we achieve sorted values by time WITHOUT explicit sorting in Spark (I
mean by using Spark framework)?
In Java/MapReduce/Hadoop, we can sort reducer values without explicit sorting:
job.setPartitionerClass(MyPartitioner.class);
job.setGroupingComparatorClass(MyGroupingComparator.class);
The question is how to sort grouped/reduced values without explicit sorting?
Thanks,
best,
Mahmoud