With Spark caching which file format is best to use parquet or ORC
Obviously ORC can be used with Hive. 
My question is whether Spark can use various file, stripe rowset statistics 
stored in ORC file?
Otherwise to me both parquet and ORC are files simply kept on HDFS. They do not 
offer any caching to be faster.
So if Spark ignores the underlying stats for ORC files, does it matter which 
file format to use with Spark.
Thanks

Reply via email to