I figured out my problem,

I wasn't using the outRowFormat section when calling the transform. I had tried with a row format, but I guess I put it in the inRowFormat section which had changed the way the data was passed to my transform function and thought that there was only one place to specify rowFormat.

So for anyone else with this issue, the key is to change the insert overwrite table query as:

select transform (*)
         using 'transform.pl'
         AS(
                 siteID,
                 otherfield,
                 type
         )
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\003'
MAP KEYS TERMINATED BY '\002

 from auditlogs where dt = '2014-03-17' limit 1

with the FIELDS... section AFTER the AS() section, not before. (as before will change the way it's passed into the transform script).

Thank you,
Nuridn.

On 14-03-18 02:45 PM, Nurdin Premji wrote:
Hello,

I'm having trouble with loading data into a table with a dynamic
partition with custom field delimiters. When I use tabs, it works, when
I have fields delimited by '001' it does not.

I'm using hive 0.10 on top of hadoop 2.0.0-cdh4.6.0

Here is some example code I was trying:

create external table test_table (
          siteid STRING,
          otherfield STRING
) PARTITIONED BY (dt STRING, type STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\003'
MAP KEYS TERMINATED BY '\002'
STORED AS TEXTFILE;

And then using a transform to fill it in:

add file /home/flex/transform.pl;
insert overwrite table test_table partition(dt = '2014-03-17', type)
select transform (*)
          using 'transform.pl'
          AS(
                  siteID,
                  otherfield,
                  type
          )
          from auditlogs where dt = '2014-03-17' limit 1

with a transform function that looks like:

transform.pl:
#!/usr/bin/perl
use strict;

LOOP: while (my $line = <STDIN>) {
          chomp $line;
          warn "Received line $line";
          print "123"."\001" . "456"."\001"."789\n";
}

The results I'm getting has: siteID = '123', otherfield='456',
dt='2014-03-17' and type equal to __HIVE_DEFAULT_PARTITION__ but I
expected that "789" would be stored as the type.

If instead I change the transform function to look like:

#!/usr/bin/perl
use strict;

LOOP: while (my $line = <STDIN>) {
          chomp $line;
          warn "Received line $line";
          print "123"."\t" . "456"."\t"."789\n";
}

Then I get siteID = '123', otherfield = '456' dt='2014-03-17' and
type='789'  which is what I expected from the first scenario.

I'd prefer not to have to change my script delimiters as switching from
\001 to \t causes other problems I haven't yet looked into.


I looked through JIRA for "transform dynamic partitions" and did not
find a bug that seemed similar to this so I'm asking on the mailing list
before I create a bug and also to check that I haven't misinterpreted
anything with transform functionality.

So does this look like a bug?

Thank you,
Nurdin.


Reply via email to