Well, the obvious thing is to load only those columns you need. Just in case you’re not doing this.
On Thu, Feb 7, 2019 at 2:04 PM Michael Doo <michael....@verve.com> wrote: > Hey all, > I’ve been migrating some processes over from ingesting Avro to ingesting > Parquet. In Spark, we’re seeing 2x-8x performance gains when using Parquet > over Avro. In Pig, similar processes are about the same runtime between the > two formats (and sometimes even higher using Parquet). We’ve enabled > dictionary filtering as well as predicate filter/pushdown. Wondering if > there are other settings / strategies we might be missing to take advantage > of Parquet. > > Thanks, > Michael > -- Russell Jurney @rjurney <http://twitter.com/rjurney> russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB <http://facebook.com/jurney> datasyndrome.com