Thanks, Raghu. Maybe another benefit of the UDF route is that it could
support the accumulator interface.
Since both approaches would use the HBase client API directly, there's no
Pig-specific benefit to using a loader, right?
Norbert
On Tue, May 29, 2012 at 8:37 PM, Raghu Angadi wrote:
> I w
I would still use a UDF, it is lot more flexible.
Passing large number of ids to the loader is part of the problem..
Your UDF would take a bag of ids and return bag{(session, events:bag{})}
You can pass the bag of ids in various ways :
- load ids as a relation, group all to put all of them in
We're analyzing session(s) using Pig and HBase, and this session data is
currently stored in a single HBase table, where rowkey is a
sessionid-eventid combo (tall table). I'm trying to optimize the
"extract-all-events-for-a-given-session" step of our workflow.
This could be a simple JOIN. But th