With Spark caching which file format is best to use parquet or ORC Obviously ORC can be used with Hive. My question is whether Spark can use various file, stripe rowset statistics stored in ORC file? Otherwise to me both parquet and ORC are files simply kept on HDFS. They do not offer any caching to be faster. So if Spark ignores the underlying stats for ORC files, does it matter which file format to use with Spark. Thanks