but why go through all this and make it so long-winded, verbose and
non-standard?  That's a pain to maintain!

just use tabs as your transform in/out separator and go easy on the next
guy who has to maintain your code. :)


On Tue, Mar 18, 2014 at 4:59 PM, Nurdin Premji <
nurdin.pre...@casalemedia.com> wrote:

> I figured out my problem,
>
> I wasn't using the outRowFormat section when calling the transform. I had
> tried with a row format, but I guess I put it in the inRowFormat section
> which had changed the way the data was passed to my transform function and
> thought that there was only one place to specify rowFormat.
>
> So for anyone else with this issue, the key is to change the insert
> overwrite table query as:
>
>
> select transform (*)
>          using 'transform.pl'
>          AS(
>                  siteID,
>                  otherfield,
>                  type
>          )
> FIELDS TERMINATED BY '\001'
> COLLECTION ITEMS TERMINATED BY '\003'
> MAP KEYS TERMINATED BY '\002
>
>  from auditlogs where dt = '2014-03-17' limit 1
>
> with the FIELDS... section AFTER the AS() section, not before. (as before
> will change the way it's passed into the transform script).
>
> Thank you,
> Nuridn.
>
>
> On 14-03-18 02:45 PM, Nurdin Premji wrote:
>
>> Hello,
>>
>> I'm having trouble with loading data into a table with a dynamic
>> partition with custom field delimiters. When I use tabs, it works, when
>> I have fields delimited by '001' it does not.
>>
>> I'm using hive 0.10 on top of hadoop 2.0.0-cdh4.6.0
>>
>> Here is some example code I was trying:
>>
>> create external table test_table (
>>           siteid STRING,
>>           otherfield STRING
>> ) PARTITIONED BY (dt STRING, type STRING)
>> ROW FORMAT DELIMITED
>> FIELDS TERMINATED BY '\001'
>> COLLECTION ITEMS TERMINATED BY '\003'
>> MAP KEYS TERMINATED BY '\002'
>> STORED AS TEXTFILE;
>>
>> And then using a transform to fill it in:
>>
>> add file /home/flex/transform.pl;
>> insert overwrite table test_table partition(dt = '2014-03-17', type)
>> select transform (*)
>>           using 'transform.pl'
>>           AS(
>>                   siteID,
>>                   otherfield,
>>                   type
>>           )
>>           from auditlogs where dt = '2014-03-17' limit 1
>>
>> with a transform function that looks like:
>>
>> transform.pl:
>> #!/usr/bin/perl
>> use strict;
>>
>> LOOP: while (my $line = <STDIN>) {
>>           chomp $line;
>>           warn "Received line $line";
>>           print "123"."\001" . "456"."\001"."789\n";
>> }
>>
>> The results I'm getting has: siteID = '123', otherfield='456',
>> dt='2014-03-17' and type equal to __HIVE_DEFAULT_PARTITION__ but I
>> expected that "789" would be stored as the type.
>>
>> If instead I change the transform function to look like:
>>
>> #!/usr/bin/perl
>> use strict;
>>
>> LOOP: while (my $line = <STDIN>) {
>>           chomp $line;
>>           warn "Received line $line";
>>           print "123"."\t" . "456"."\t"."789\n";
>> }
>>
>> Then I get siteID = '123', otherfield = '456' dt='2014-03-17' and
>> type='789'  which is what I expected from the first scenario.
>>
>> I'd prefer not to have to change my script delimiters as switching from
>> \001 to \t causes other problems I haven't yet looked into.
>>
>>
>> I looked through JIRA for "transform dynamic partitions" and did not
>> find a bug that seemed similar to this so I'm asking on the mailing list
>> before I create a bug and also to check that I haven't misinterpreted
>> anything with transform functionality.
>>
>> So does this look like a bug?
>>
>> Thank you,
>> Nurdin.
>>
>>
>

Reply via email to