I have a use case for our data in HDFS that involves sorting chunks of data into time series format by a specific characteristic and doing computations from that. At large scale, what is the most efficient way to do this? Obviously, having the data sharded by that characteristic would make the performance significantly better, but are there good tools Spark can do to help us?
- Dealing with Time Series Data Gary Malouf