Re: Best practice for DB connection

2012-03-06 Thread Mark Kerzner
I see, Bill, thank you. But I think I need something different. I am processing line after line, and for some elements I extract from each line, I am doing HBase lookups. So I need an open connection to stay open during the life of a mapper. Thank you, Mark On Tue, Mar 6, 2012 at 7:14 PM, Bill G

Re: Best practice for DB connection

2012-03-06 Thread Bill Graham
Have you checked out HBaseStorage? http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html On Tue, Mar 6, 2012 at 5:02 PM, Mark Kerzner wrote: > Hi, > > I need to initialize the HBase connection, which I normally do in > configure() in the Mapper, and then my

Best practice for DB connection

2012-03-06 Thread Mark Kerzner
Hi, I need to initialize the HBase connection, which I normally do in configure() in the Mapper, and then my mapper uses it. How do I do it in Pig? I am ready to define a UDF that will return a handle, but is it a best practice? Thank you, Mark

HBaseStorage STORE method comparison

2012-03-06 Thread Norbert Burger
Hi folks -- For a very sparse HBase table (2 column families, 1000s of columns) what's the expected performance difference in using HBaseStorage with the following two STORE methods? Note that in our use case, there only a handful of unique rowkeys (approx 10). 1) GROUP BY the 1000s of columns b

Re: View Map-Reduce payload

2012-03-06 Thread Yongzhi Wang
Hi, Sorry to bother. I tried to use the syntax "explain", but the MapReduce plan displayed sometime still makes me feel confused. I tried such syntax below: *my_raw = LOAD './houred-small' USING PigStorage('\t') AS (user,hour, query); part1 = filter my_raw by hour>11; part2 = filter my_raw by h

Re: View Map-Reduce payload

2012-03-06 Thread Aniket Mokashi
http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#EXPLAIN On Tue, Mar 6, 2012 at 5:28 AM, shan shan wrote: > Hi > Can I see the user-payload for the MapReduce job that is created by Pig. > How? > i.e. the Map and Reduce function code that is generated by Pig script.. > > Thanks, > -- "..

Re: Using merge join from a HBaseStorage

2012-03-06 Thread Kevin Lion
Hello, I've made a patch for this issue. You can find all infos about that here : https://issues.apache.org/jira/browse/PIG-2495 Kevin Capptain.com - Pilot your apps! 2012/1/24 Kevin Lion > Hi, > > To increase performance of my computation, I would like to use a merge > join between two ta