Hey all, I’ve been migrating some processes over from ingesting Avro to ingesting Parquet. In Spark, we’re seeing 2x-8x performance gains when using Parquet over Avro. In Pig, similar processes are about the same runtime between the two formats (and sometimes even higher using Parquet). We’ve enabled dictionary filtering as well as predicate filter/pushdown. Wondering if there are other settings / strategies we might be missing to take advantage of Parquet.
Thanks, Michael