Avro vs Parquet performance on Pig

Michael Doo Thu, 07 Feb 2019 14:05:09 -0800

Hey all,
I’ve been migrating some processes over from ingesting Avro to ingesting 
Parquet. In Spark, we’re seeing 2x-8x performance gains when using Parquet over 
Avro. In Pig, similar processes are about the same runtime between the two 
formats (and sometimes even higher using Parquet). We’ve enabled dictionary 
filtering as well as predicate filter/pushdown. Wondering if there are other 
settings / strategies we might be missing to take advantage of Parquet.


Thanks,
Michael

Avro vs Parquet performance on Pig

Reply via email to