Re: Reading partitioned Parquet data into Pig

Adam Szita Thu, 30 Aug 2018 07:15:23 -0700

Hi Eyal,

For just loading Parquet files the Parquet Pig loader is okay, although I
don't think it lets you use partition values in the dataset later.
I know the plain old PigStorage has a trick with -tagFiles option but not
sure if that'd be enough in Michael's case and also if that's something
Parquet Loader supports.


Thanks

On Thu, 30 Aug 2018 at 16:10, Eyal Allweil <eyal_allw...@yahoo.com.invalid>
wrote:

> Hi Michael,
> You can also use the Parquet Pig loader (especially if you're not working
> with Hive). Here's a link to the Maven repository for it.
>
> https://mvnrepository.com/artifact/org.apache.parquet/parquet-pig/1.10.0
> Regards,Eyal
> <https://mvnrepository.com/artifact/org.apache.parquet/parquet-pig/1.10.0Regards,Eyal>
>
>
>
>
>
>    On Tuesday, August 28, 2018, 2:40:36 PM GMT+3, Adam Szita
> <sz...@cloudera.com.INVALID> wrote:
>
>  Hi Michael,
>
> Yes you can use HCatLoader to do this.
> The requirement is that you have a Hive table defined on top of your data
> (probably pointing to s3://path/to/files) (and Hive MetaStore has all the
> relevant meta/schema information).
> If you do not have a Hive table yet, you can go ahead and define it in Hive
> by manually specifying schema information, and after that partitions can be
> added automatically via the 'msck repair' function of Hive.
>
> Hope this helps,
> Adam
>
>
> On Mon, 27 Aug 2018 at 19:18, Michael Doo <michael....@verve.com> wrote:
>
> > Hello,
> >
> > I’m trying to read in Parquet data into Pig that is partitioned (so it’s
> > stored in S3 like
> >
> s3://path/to/files/some_flag=true/part-00095-a2a6230b-9750-48e4-9cd0-b553ffc220de.c000.gz.parquet).
> > I’d like to load it into Pig and add the partitions as columns. I’ve read
> > some resources suggesting using the HCatLoader, but so far haven’t had
> > success.
> >
> > Any advice would be welcome.
> >
> > ~ Michael
> >

Re: Reading partitioned Parquet data into Pig

Reply via email to