Hi,

I need to write a pig UDF which takes string and personId as an input tuple. 
The personId is a key to query hbase within this UDF. I have created a 
connection to hbase when the UDF class loads. 

The problem here is PigStorage actually treats each row as a tuple and I have 
to query each personId independently and because of this I couldnt do a bulk 
query on hbase. For example I want to do a query on 1000 personId’s at a time. 
The reason for this is to improve the round trip performance and I have already 
created prototype on this and seen drastic improvement.

I tried to extend the PigStorage by creating NLineStorage class and overriding 
the getNext() method. But the getNext() method returns a tuple, but all I want 
is a Bag of tuples. Even if I implement this as a tuple, I can’t implement the 
tuple’s method’s like getType etc., because those methods are for individual 
columns not for an entire tuple (since this is a list of tuples). 

I am struck on this and I am not able to proceed on this. Can someone please 
help me on this ? Am I doing something wrong here ? By the way I do not have 
any way of using Accumulator interface because I cannot in any way use a 
groupBy.

Any help on this will be deeply appreciated. 


Thanks,
Sandeep


Reply via email to