Hello,
I'm having trouble with loading data into a table with a dynamic
partition with custom field delimiters. When I use tabs, it works, when
I have fields delimited by '001' it does not.
I'm using hive 0.10 on top of hadoop 2.0.0-cdh4.6.0
Here is some example code I was trying:
create external table test_table (
siteid STRING,
otherfield STRING
) PARTITIONED BY (dt STRING, type STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\003'
MAP KEYS TERMINATED BY '\002'
STORED AS TEXTFILE;
And then using a transform to fill it in:
add file /home/flex/transform.pl;
insert overwrite table test_table partition(dt = '2014-03-17', type)
select transform (*)
using 'transform.pl'
AS(
siteID,
otherfield,
type
)
from auditlogs where dt = '2014-03-17' limit 1
with a transform function that looks like:
transform.pl:
#!/usr/bin/perl
use strict;
LOOP: while (my $line = <STDIN>) {
chomp $line;
warn "Received line $line";
print "123"."\001" . "456"."\001"."789\n";
}
The results I'm getting has: siteID = '123', otherfield='456',
dt='2014-03-17' and type equal to __HIVE_DEFAULT_PARTITION__ but I
expected that "789" would be stored as the type.
If instead I change the transform function to look like:
#!/usr/bin/perl
use strict;
LOOP: while (my $line = <STDIN>) {
chomp $line;
warn "Received line $line";
print "123"."\t" . "456"."\t"."789\n";
}
Then I get siteID = '123', otherfield = '456' dt='2014-03-17' and
type='789' which is what I expected from the first scenario.
I'd prefer not to have to change my script delimiters as switching from
\001 to \t causes other problems I haven't yet looked into.
I looked through JIRA for "transform dynamic partitions" and did not
find a bug that seemed similar to this so I'm asking on the mailing list
before I create a bug and also to check that I haven't misinterpreted
anything with transform functionality.
So does this look like a bug?
Thank you,
Nurdin.