Re: LOAD function vs. UDF eval

2012-05-29 Thread Norbert Burger
Thanks, Raghu. Maybe another benefit of the UDF route is that it could support the accumulator interface. Since both approaches would use the HBase client API directly, there's no Pig-specific benefit to using a loader, right? Norbert On Tue, May 29, 2012 at 8:37 PM, Raghu Angadi wrote: > I w

Re: LOAD function vs. UDF eval

2012-05-29 Thread Raghu Angadi
I would still use a UDF, it is lot more flexible. Passing large number of ids to the loader is part of the problem.. Your UDF would take a bag of ids and return bag{(session, events:bag{})} You can pass the bag of ids in various ways : - load ids as a relation, group all to put all of them in

LOAD function vs. UDF eval

2012-05-29 Thread Norbert Burger
We're analyzing session(s) using Pig and HBase, and this session data is currently stored in a single HBase table, where rowkey is a sessionid-eventid combo (tall table). I'm trying to optimize the "extract-all-events-for-a-given-session" step of our workflow. This could be a simple JOIN. But th