Hi all, I am wondering what do people use as the on disk storage format. I have seen almost all the examples use csv files to store and load data but that seems too simplisting for obvious reasons (compressibility to name one). I was just interested to find out what people use to store computation results. For example consider that you did some computation on some log files and want to store all sorts of metrics for each and every user so that you can later use shark to query it interactively. What is the preferred or good format to store all the data? Parquet? RCFiles? csv? JSON?
-- Ankur