I see, Bill, thank you.
But I think I need something different. I am processing line after line,
and for some elements I extract from each line, I am doing HBase lookups.
So I need an open connection to stay open during the life of a mapper.
Thank you,
Mark
On Tue, Mar 6, 2012 at 7:14 PM, Bill G
Have you checked out HBaseStorage?
http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html
On Tue, Mar 6, 2012 at 5:02 PM, Mark Kerzner wrote:
> Hi,
>
> I need to initialize the HBase connection, which I normally do in
> configure() in the Mapper, and then my
Hi,
I need to initialize the HBase connection, which I normally do in
configure() in the Mapper, and then my mapper uses it. How do I do it in
Pig?
I am ready to define a UDF that will return a handle, but is it a best
practice?
Thank you,
Mark
Hi folks --
For a very sparse HBase table (2 column families, 1000s of columns) what's
the expected performance difference in using HBaseStorage with the
following two STORE methods? Note that in our use case, there only a
handful of unique rowkeys (approx 10).
1) GROUP BY the 1000s of columns b
Hi,
Sorry to bother.
I tried to use the syntax "explain", but the MapReduce plan displayed
sometime still makes me feel confused.
I tried such syntax below:
*my_raw = LOAD './houred-small' USING PigStorage('\t') AS (user,hour,
query);
part1 = filter my_raw by hour>11;
part2 = filter my_raw by h
http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#EXPLAIN
On Tue, Mar 6, 2012 at 5:28 AM, shan shan wrote:
> Hi
> Can I see the user-payload for the MapReduce job that is created by Pig.
> How?
> i.e. the Map and Reduce function code that is generated by Pig script..
>
> Thanks,
>
--
"..
Hello,
I've made a patch for this issue. You can find all infos about that here :
https://issues.apache.org/jira/browse/PIG-2495
Kevin
Capptain.com - Pilot your apps!
2012/1/24 Kevin Lion
> Hi,
>
> To increase performance of my computation, I would like to use a merge
> join between two ta