Hi, I need to write a pig UDF which takes string and personId as an input tuple. The personId is a key to query hbase within this UDF. I have created a connection to hbase when the UDF class loads.
The problem here is PigStorage actually treats each row as a tuple and I have to query each personId independently and because of this I couldnt do a bulk query on hbase. For example I want to do a query on 1000 personId’s at a time. The reason for this is to improve the round trip performance and I have already created prototype on this and seen drastic improvement. I tried to extend the PigStorage by creating NLineStorage class and overriding the getNext() method. But the getNext() method returns a tuple, but all I want is a Bag of tuples. Even if I implement this as a tuple, I can’t implement the tuple’s method’s like getType etc., because those methods are for individual columns not for an entire tuple (since this is a list of tuples). I am struck on this and I am not able to proceed on this. Can someone please help me on this ? Am I doing something wrong here ? By the way I do not have any way of using Accumulator interface because I cannot in any way use a groupBy. Any help on this will be deeply appreciated. Thanks, Sandeep