You might need https://issues.apache.org/jira/browse/PIG-4092
On Thu, Feb 7, 2019 at 3:54 PM Russell Jurney <russell.jur...@gmail.com> wrote: > Sorry if this isn't helpful, but the other obvious thing is to store > intermediate data in Parquet whenever you repeat code/data that can be > shared between jobs. If tests indicate it is faster. Before Parquet this > wasn't necessarily advantageous as IO from disk is slower than IO through > RAM which the computation might be. Parquet open opportunities here by > competing better with repeat computation. You could compare the two to > figure out how to optimize your scripts. Again, you're probably doing this > :) > > Russell Jurney @rjurney <http://twitter.com/rjurney> > russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB > <http://facebook.com/jurney> datasyndrome.com > > > On Thu, Feb 7, 2019 at 3:29 PM Michael Doo <michael....@verve.com> wrote: > > > Indeed. When loading Parquet using > org.apache.parquet.pig.ParquetLoader(), > > we're specifying the schema for which columns we want to load. > > > > On 2/7/19, 5:14 PM, "Russell Jurney" <russell.jur...@gmail.com> wrote: > > > > Well, the obvious thing is to load only those columns you need. Just > in > > case you’re not doing this. > > > > On Thu, Feb 7, 2019 at 2:04 PM Michael Doo <michael....@verve.com> > > wrote: > > > > > Hey all, > > > I’ve been migrating some processes over from ingesting Avro to > > ingesting > > > Parquet. In Spark, we’re seeing 2x-8x performance gains when using > > Parquet > > > over Avro. In Pig, similar processes are about the same runtime > > between the > > > two formats (and sometimes even higher using Parquet). We’ve > enabled > > > dictionary filtering as well as predicate filter/pushdown. > Wondering > > if > > > there are other settings / strategies we might be missing to take > > advantage > > > of Parquet. > > > > > > Thanks, > > > Michael > > > > > -- > > Russell Jurney @rjurney <http://twitter.com/rjurney> > > russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> > FB > > <http://facebook.com/jurney> datasyndrome.com > > > > > > >