Hi Michael,

Yes you can use HCatLoader to do this.
The requirement is that you have a Hive table defined on top of your data
(probably pointing to s3://path/to/files) (and Hive MetaStore has all the
relevant meta/schema information).
If you do not have a Hive table yet, you can go ahead and define it in Hive
by manually specifying schema information, and after that partitions can be
added automatically via the 'msck repair' function of Hive.

Hope this helps,

On Mon, 27 Aug 2018 at 19:18, Michael Doo <michael....@verve.com> wrote:

> Hello,
> I’m trying to read in Parquet data into Pig that is partitioned (so it’s
> stored in S3 like
> s3://path/to/files/some_flag=true/part-00095-a2a6230b-9750-48e4-9cd0-b553ffc220de.c000.gz.parquet).
> I’d like to load it into Pig and add the partitions as columns. I’ve read
> some resources suggesting using the HCatLoader, but so far haven’t had
> success.
> Any advice would be welcome.
> ~ Michael

Reply via email to