The parquet orc thing has to be tje biggest detractor. Your forced to chose
between a format good for impala or good for hive.

On May 4, 2017 3:57 PM, "Gopal Vijayaraghavan" <gop...@apache.org> wrote:

> Hi,
>
>
> > Does Hive LLAP work with Parquet format as well?
>
>
>
> LLAP does work with the Parquet format, but it does not work very fast,
> because the java Parquet reader is slow.
>
> https://issues.apache.org/jira/browse/PARQUET-131
> +
>
> https://issues.apache.org/jira/browse/HIVE-14826
>
> In particular to your question, Parquet's columnar data reads haven't been
> optimized for Azure/S3/GCS.
>
> There was a comparison of ORC vs Parquet for NYC taxi data and it found
> that for simple queries Parquet read ~4x more data over the network - your
> problem might be bandwidth related.
>
> You might want to convert a small amount to ORC and see whether the
> BYTES_READ drops or not.
>
> In my tests with a recent LLAP, Text data was faster on LLAP on S3 & Azure
> than Parquet, because Text has a vectorized reader & cache support.
>
> Cheers,
>
> Gopal
>

Reply via email to