Here's some simple Pig that reads from one Hive table and writes to another
(same data, same schema):

sigs_in = load 'signals' using org.apache.hcatalog.pig.HCatLoader();
sigs = filter sigs_in by datetime_partition == '2013-10-07_0000';
STORE sigs INTO 'signals_orc' USING org.apache.hcatalog.pig.HCatStorer();

the signals_orc table is defined to have the datetime_partition partition,
as in:

create external table signals_orc (
 signal_id string,
 ...
) partitioned by (datetime_partition string)
stored as ORC
location '/user/hive/external/signals_orc'
tblproperties ("orc.compress"="snappy");

After running the job, I end up with the following directory in HDFS:

/user/hive/external/signals_orc/datetime_partition=__HIVE_DEFAULT_PARTITION__

When clearly the filter in the Pig proves the datetime_partition field in
my data is valid. If I change the store clause in my Pig script to:

STORE sigs INTO 'signals_orc' USING
org.apache.hcatalog.pig.HCatStorer('datetime_partition=2013-10-07_0000');

then I get the correct output:

/user/hive/external/signals_orc/datetime_partition=2013-10-07_0000

So it appears to me that there's something broken in the dynamic
partitioning code in HCatalog 0.11.0.

Thanks.
Tim

Reply via email to