Re: Reading partitioned Parquet data into Pig

Eyal Allweil Thu, 30 Aug 2018 07:10:35 -0700

Hi Michael,
You can also use the Parquet Pig loader (especially if you're not working with 
Hive). Here's a link to the Maven repository for it.

https://mvnrepository.com/artifact/org.apache.parquet/parquet-pig/1.10.0
Regards,Eyal

   On Tuesday, August 28, 2018, 2:40:36 PM GMT+3, Adam Szita 
<sz...@cloudera.com.INVALID> wrote:  

 Hi Michael,

Yes you can use HCatLoader to do this.
The requirement is that you have a Hive table defined on top of your data
(probably pointing to s3://path/to/files) (and Hive MetaStore has all the
relevant meta/schema information).
If you do not have a Hive table yet, you can go ahead and define it in Hive
by manually specifying schema information, and after that partitions can be
added automatically via the 'msck repair' function of Hive.

Hope this helps,
Adam

On Mon, 27 Aug 2018 at 19:18, Michael Doo <michael....@verve.com> wrote:

> Hello,
>
> I’m trying to read in Parquet data into Pig that is partitioned (so it’s
> stored in S3 like
> s3://path/to/files/some_flag=true/part-00095-a2a6230b-9750-48e4-9cd0-b553ffc220de.c000.gz.parquet).
> I’d like to load it into Pig and add the partitions as columns. I’ve read
> some resources suggesting using the HCatLoader, but so far haven’t had
> success.
>
> Any advice would be welcome.
>
> ~ Michael
>

Re: Reading partitioned Parquet data into Pig

Reply via email to