I have a use case for our data in HDFS that involves sorting chunks of data
into time series format by a specific characteristic and doing computations
from that.  At large scale, what is the most efficient way to do this?
 Obviously, having the data sharded by that characteristic would make the
performance significantly better, but are there good tools Spark can do to
help us?

Reply via email to