Hi all,

I am wondering what do people use as the on disk storage format. I have seen 
almost all the examples use csv files to store and load data but that seems too 
simplisting for obvious reasons (compressibility to name one). I was just 
interested to find out what people use to store computation results. For 
example consider that you did some computation on some log files and want to 
store all sorts of metrics for each and every user so that you can later use 
shark to query it interactively. What is the preferred or good format to store 
all the data? Parquet? RCFiles? csv? JSON?

-- Ankur

Reply via email to