Re: Avro vs Parquet performance on Pig

Russell Jurney Thu, 07 Feb 2019 14:14:13 -0800

Well, the obvious thing is to load only those columns you need. Just in
case you’re not doing this.


On Thu, Feb 7, 2019 at 2:04 PM Michael Doo <[email protected]> wrote:

> Hey all,
> I’ve been migrating some processes over from ingesting Avro to ingesting
> Parquet. In Spark, we’re seeing 2x-8x performance gains when using Parquet
> over Avro. In Pig, similar processes are about the same runtime between the
> two formats (and sometimes even higher using Parquet). We’ve enabled
> dictionary filtering as well as predicate filter/pushdown. Wondering if
> there are other settings / strategies we might be missing to take advantage
> of Parquet.
>
> Thanks,
> Michael
>
-- 
Russell Jurney @rjurney <http://twitter.com/rjurney>
[email protected] LI <http://linkedin.com/in/russelljurney> FB
<http://facebook.com/jurney> datasyndrome.com

Re: Avro vs Parquet performance on Pig

Reply via email to