Re: Hive LLAP with Parquet format

Gopal Vijayaraghavan Thu, 04 May 2017 12:58:03 -0700

Hi,


 > Does Hive LLAP work with Parquet format as well?

 

LLAP does work with the Parquet format, but it does not work very fast, because 
the java Parquet reader is slow.

https://issues.apache.org/jira/browse/PARQUET-131
+

https://issues.apache.org/jira/browse/HIVE-14826

In particular to your question, Parquet's columnar data reads haven't been 
optimized for Azure/S3/GCS.

There was a comparison of ORC vs Parquet for NYC taxi data and it found that 
for simple queries Parquet read ~4x more data over the network - your problem 
might be bandwidth related.

You might want to convert a small amount to ORC and see whether the BYTES_READ 
drops or not.

In my tests with a recent LLAP, Text data was faster on LLAP on S3 & Azure than 
Parquet, because Text has a vectorized reader & cache support.


Cheers,

Gopal

Re: Hive LLAP with Parquet format

Reply via email to