but why go through all this and make it so long-winded, verbose and non-standard? That's a pain to maintain!
just use tabs as your transform in/out separator and go easy on the next guy who has to maintain your code. :) On Tue, Mar 18, 2014 at 4:59 PM, Nurdin Premji < nurdin.pre...@casalemedia.com> wrote: > I figured out my problem, > > I wasn't using the outRowFormat section when calling the transform. I had > tried with a row format, but I guess I put it in the inRowFormat section > which had changed the way the data was passed to my transform function and > thought that there was only one place to specify rowFormat. > > So for anyone else with this issue, the key is to change the insert > overwrite table query as: > > > select transform (*) > using 'transform.pl' > AS( > siteID, > otherfield, > type > ) > FIELDS TERMINATED BY '\001' > COLLECTION ITEMS TERMINATED BY '\003' > MAP KEYS TERMINATED BY '\002 > > from auditlogs where dt = '2014-03-17' limit 1 > > with the FIELDS... section AFTER the AS() section, not before. (as before > will change the way it's passed into the transform script). > > Thank you, > Nuridn. > > > On 14-03-18 02:45 PM, Nurdin Premji wrote: > >> Hello, >> >> I'm having trouble with loading data into a table with a dynamic >> partition with custom field delimiters. When I use tabs, it works, when >> I have fields delimited by '001' it does not. >> >> I'm using hive 0.10 on top of hadoop 2.0.0-cdh4.6.0 >> >> Here is some example code I was trying: >> >> create external table test_table ( >> siteid STRING, >> otherfield STRING >> ) PARTITIONED BY (dt STRING, type STRING) >> ROW FORMAT DELIMITED >> FIELDS TERMINATED BY '\001' >> COLLECTION ITEMS TERMINATED BY '\003' >> MAP KEYS TERMINATED BY '\002' >> STORED AS TEXTFILE; >> >> And then using a transform to fill it in: >> >> add file /home/flex/transform.pl; >> insert overwrite table test_table partition(dt = '2014-03-17', type) >> select transform (*) >> using 'transform.pl' >> AS( >> siteID, >> otherfield, >> type >> ) >> from auditlogs where dt = '2014-03-17' limit 1 >> >> with a transform function that looks like: >> >> transform.pl: >> #!/usr/bin/perl >> use strict; >> >> LOOP: while (my $line = <STDIN>) { >> chomp $line; >> warn "Received line $line"; >> print "123"."\001" . "456"."\001"."789\n"; >> } >> >> The results I'm getting has: siteID = '123', otherfield='456', >> dt='2014-03-17' and type equal to __HIVE_DEFAULT_PARTITION__ but I >> expected that "789" would be stored as the type. >> >> If instead I change the transform function to look like: >> >> #!/usr/bin/perl >> use strict; >> >> LOOP: while (my $line = <STDIN>) { >> chomp $line; >> warn "Received line $line"; >> print "123"."\t" . "456"."\t"."789\n"; >> } >> >> Then I get siteID = '123', otherfield = '456' dt='2014-03-17' and >> type='789' which is what I expected from the first scenario. >> >> I'd prefer not to have to change my script delimiters as switching from >> \001 to \t causes other problems I haven't yet looked into. >> >> >> I looked through JIRA for "transform dynamic partitions" and did not >> find a bug that seemed similar to this so I'm asking on the mailing list >> before I create a bug and also to check that I haven't misinterpreted >> anything with transform functionality. >> >> So does this look like a bug? >> >> Thank you, >> Nurdin. >> >> >