The parquet orc thing has to be tje biggest detractor. Your forced to chose between a format good for impala or good for hive.
On May 4, 2017 3:57 PM, "Gopal Vijayaraghavan" <gop...@apache.org> wrote: > Hi, > > > > Does Hive LLAP work with Parquet format as well? > > > > LLAP does work with the Parquet format, but it does not work very fast, > because the java Parquet reader is slow. > > https://issues.apache.org/jira/browse/PARQUET-131 > + > > https://issues.apache.org/jira/browse/HIVE-14826 > > In particular to your question, Parquet's columnar data reads haven't been > optimized for Azure/S3/GCS. > > There was a comparison of ORC vs Parquet for NYC taxi data and it found > that for simple queries Parquet read ~4x more data over the network - your > problem might be bandwidth related. > > You might want to convert a small amount to ORC and see whether the > BYTES_READ drops or not. > > In my tests with a recent LLAP, Text data was faster on LLAP on S3 & Azure > than Parquet, because Text has a vectorized reader & cache support. > > Cheers, > > Gopal >