I'm having a possible issue with a simple pig load that writes to an HBase
table. The issue is that when I run the test pig script it does not invoke
the region observer coprocessor on the table. I have verified that my
coprocessor executes when I use the HBase client API to do a simple put().
S
Hmm. The data in my tables is not important. So, I dropped the table and
recreated it. This doesn't seem to have resolved the issue, though.
Is there perhaps a Pig query I can run that would use a built in HBase table,
like that .META. table, and see if it works? I don't know if that'd help or
Actually on second glance, this seems like an issue not with the HBase
config, but with the server:port info inside your .META. table. Have you
tried LOADing from a different table besides "events"? From the HBase
shell, you can use the following command to extract server hostnames for
each of yo
I was thinking that maybe it was because I did not have HBase config path
on PIG_CLASSPATH, so I added it. This did not help, though.
Ryan
On Thu, Mar 22, 2012 at 9:07 PM, Ryan Cole wrote:
> Norbert,
>
> I have confirmed that this is indeed an issue connecting to HBase. I tried
> just running a
Rohini,
Here is the JIRA. https://issues.apache.org/jira/browse/PIG-2610
Can you please post the stacktrace as a comment to it?
Thanks,
Prashant
On Thu, Mar 22, 2012 at 2:37 PM, Jonathan Coveney wrote:
> Rohini,
>
> In the meantime, something like the following should work:
>
> aw = LOAD 'inpu
Norbert,
I have confirmed that this is indeed an issue connecting to HBase. I tried
just running a Pig script that did not use HBaseStorage, and it works. Here
is my hbase-site.xml config file, as well as my query that I'm running:
https://gist.github.com/2166187
Also, for ease of reference, her
You're encountering problems connecting to HBase (presumably your Pig
script uses HBaseStorage). How does your hbase/conf/hbase-site.xml look?
Norbert
On Thu, Mar 22, 2012 at 9:16 PM, Ryan Cole wrote:
> Hello,
>
> I'm new to these lists. I'm trying to get Pig working, for my first time. I
> ha
Hello,
I'm new to these lists. I'm trying to get Pig working, for my first time. I
have setup Hadoop and HBase (on HDFS) using the psuedo-distributed setup,
all on one machine. I am able to run MapReduce jobs, using the example.jar
file included with the Hadoop release.
Whenever I try to run even
you write to populate the date for you!)
>>>
>>> 2012/3/22 Mohit Anchlia
>>>
>>> On Thu, Mar 22, 2012 at 2:34 PM, Thejas Nair
>>>> wrote:
>>>>
>>>> Is this what you are looking for ? -
>>>>>
>>>
se you write to populate the date for you!)
2012/3/22 Mohit Anchlia
On Thu, Mar 22, 2012 at 2:34 PM, Thejas Nair
wrote:
Is this what you are looking for ? -
A = LOAD '$in' USING PigStorage('\t') AS (...
B = foreach A generate *, '20120322' as date;
STORE B i
; > wrote:
> >
> > > Is this what you are looking for ? -
> > >
> > >
> > > A = LOAD '$in' USING PigStorage('\t') AS (...
> > >
> > > B = foreach A generate *, '20120322' as date;
> > &g
You are over allocating memory per each java process in Hadoop. Memory
allocation = (mappers + reducers) * child.java.opts memory setting.
This would only happen when your node is fully utilized.
Alex Rovner
Sent from my iPhone
On Mar 21, 2012, at 10:41 PM, rakesh sharma wrote:
>
> Hi Al
te to populate the date for you!)
2012/3/22 Mohit Anchlia
> On Thu, Mar 22, 2012 at 2:34 PM, Thejas Nair
> wrote:
>
> > Is this what you are looking for ? -
> >
> >
> > A = LOAD '$in' USING PigStorage('\t') AS (...
> >
> &
On Thu, Mar 22, 2012 at 2:34 PM, Thejas Nair wrote:
> Is this what you are looking for ? -
>
>
> A = LOAD '$in' USING PigStorage('\t') AS (...
>
> B = foreach A generate *, '20120322' as date;
>
> STORE B into ...
>
> Thanks,
> Thej
Jonathan, Prashant, you guys are awesome! Thanks for the explanation! It's much
clearer now!
On Mar 22, 2012, at 4:40 PM, Prashant Kommireddi wrote:
> Aggregation functions (COUNT, SUM, AVG..) work on bags. Since you are
> counting on the entire relation in this case you did a GROUP ALL, in whi
Mohit,
Is date a field in your dataset, or current date or something else? Few
options
1. You could let Database implicitly create a date field if you need the
INSERT date
2. As Thejas suggested, simply insert it as '20120322' as date. I don't
think DB has an
Aggregation functions (COUNT, SUM, AVG..) work on bags. Since you are
counting on the entire relation in this case you did a GROUP ALL, in which
case, as you said, forms a bag out of all tuples.
grunt> A = load 'data' as (a:int, b:int);
grunt> describe A;
A: {a: int,b: int}
Now, once the GROUP op
Rohini,
In the meantime, something like the following should work:
aw = LOAD 'input' using MyCustomLoader();
searches = FOREACH raw GENERATE
day, searchType,
FLATTEN(impBag) AS (adType, clickCount)
;
searches_2 = foreach searches generate *, ( adType ==
Is this what you are looking for ? -
A = LOAD '$in' USING PigStorage('\t') AS (...
B = foreach A generate *, '20120322' as date;
STORE B into ...
Thanks,
Thejas
On 3/22/12 1:13 PM, Mohit Anchlia wrote:
Yes that's exactly what I am asking. Reading from f
Woops, fat fingere dit. Part two:
grunt> d = foreach c generate SUM($0);
Wait a second...this doesn't make much sense. Foreaches work on columns in
rows, not on relations (nothing works on relations). So how do we count
things? We need to put everything in one row.
grunt> d = group c all;
grunt
The reason can be a little hard to grok at first, but it's core to
Pig...perhaps we need a tutorial explaining the model a bit more clearly.
The foundation of Pig is a relation, ie, scans. What does this means? It
means that you have a bunch of rows, and these rows have things. I'm going
to diverg
Very nice, worked like a champ, Prashant.
Any chance you could explain why? I'd love to be taught to fish, not just given
the fish to eat. ;-)
GROUP ALL, as I read it, pulls the tuples into a single group. But, FOREACH'ing
on each group, and counting against productscans is where my brain start
Thanks Prashant,
I am using Pig 0.9.1 and hadoop 0.20.205
Thanks,
Rohini
On Thu, Mar 22, 2012 at 1:27 PM, Prashant Kommireddi wrote:
> This makes more sense, grouping and filter are on different columns. I will
> open a JIRA soon.
>
> What version of Pig and Hadoop are you using?
>
> Thanks,
> P
Hi Jason,
Are you trying to count the number of records in the relation
'productscans'? In which case you would have to use GROUP
http://pig.apache.org/docs/r0.9.1/basic.html#GROUP
grpd = GROUP productscans ALL;
scancount = FOREACH grpd GENERATE COUNT(productscans);
DUMP scancount;
Thanks,
Prash
Hey all,
I'm trying to write a script to pull the count of a dataset that I've filtered.
Here's the script so far:
/* scans by title */
scans = LOAD '/hive/scans/*' USING PigStorage(',') AS
(thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
This makes more sense, grouping and filter are on different columns. I will
open a JIRA soon.
What version of Pig and Hadoop are you using?
Thanks,
Prashant
On Thu, Mar 22, 2012 at 1:12 PM, Rohini U wrote:
> Hi Prashant,
>
> Here is my script in full.
>
>
> raw = LOAD 'input' using MyCustomLoa
Yes that's exactly what I am asking. Reading from flat file and then
inserting it into the database. And I want to insert date before storing.
for eg I want to add date before A gets stored:
A = LOAD '$in' USING PigStorage('\t') AS (...
STORE A into ...
On Thu, Mar 22, 2012 at 12:54 PM, Jonath
Hi Prashant,
Here is my script in full.
raw = LOAD 'input' using MyCustomLoader();
searches = FOREACH raw GENERATE
day, searchType,
FLATTEN(impBag) AS (adType, clickCount)
;
groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50;
counts =
Do you mean you're reading a relation from Hadoop, and want to append the
date to the row before inserting it? I'm not quite sure what you're asking
for.
2012/3/22 Mohit Anchlia
> Sorry I mean to ask if there is any way to insert date into the ALIAS so
> that I can use it before storing it into
Sorry I mean to ask if there is any way to insert date into the ALIAS so
that I can use it before storing it into DB.
On Thu, Mar 22, 2012 at 12:47 PM, Mohit Anchlia wrote:
> I am reading bunch of columns from a flat file and inserting it into the
> database. Is there a way to also insert date?
>
Hi Rohini,
>From your query it looks like you are already grouping it by TYPE, so not
sure why you would want the SUM of, say "EMPLOYER" type in "LOCATION" and
vice-versa. Your output is already broken down by TYPE.
Thanks,
Prashant
On Thu, Mar 22, 2012 at 9:03 AM, Rohini U wrote:
> Thanks for
It's done for some cases, but this one is different since the group key
needs to change.
D
On Wed, Mar 21, 2012 at 11:41 PM, Prashant Kommireddi
wrote:
> Sure I can do that. Isn't this something that should be done already? Or
> does it not work if the filter is working on a field that is part o
So, as explained earlier, the reason you are running out of memory is that
you are loading all records into memory when you want to do non-algebraic
things to results of grouping.
Can you come up with ways to achieve what you need without having to have
the raw records at the reducer?
One way has
Has a Jira been filed for this? I can send my example I am trying if that
helps.
Thanks,
Rohini
On Wed, Mar 21, 2012 at 11:41 PM, Prashant Kommireddi
wrote:
> Sure I can do that. Isn't this something that should be done already? Or
> does it not work if the filter is working on a field that is p
Thanks for the suggestion Prashant. However, that will not work in my case.
If I filter before the group and include the new field in group as you
suggested, I get the individual counts broken by the select field
critieria. However, I want the totals also without taking the select fields
into acco
35 matches
Mail list logo