having to group everything.
Cheers,
Rodrigo
2015-01-08 7:27 GMT-02:00 Marco Cadetg ma...@zattoo.com:
Hi there,
I've a big pig script which first generates some expensive intermediate
result on which I run multiple group by statements and multiple stores.
Something like
Hi there,
I've a big pig script which first generates some expensive intermediate
result on which I run multiple group by statements and multiple stores.
Something like this.
Register UDFs etc
A = LOAD
B = LOAD
C = LOAD
-- do lots of transformations with A and B and C get
to use a combination of Piggybank and custom UDFs.
On Mon, Mar 18, 2013 at 5:13 PM, Marco Cadetg ma...@zattoo.com wrote:
Thanks a lot Mike. This seems to be what I'm looking for ;)
I'm a bit disappointed that what I wanted to achieve isn't possible
without
using any UDF.
Cheers,
-Marco
Hi there,
I would like to do something very similar to a nested foreach with using
order by and then limit. But I would like to limit on a relation to the
total number of records.
users = load 'users' as (userid:chararray, money:long, region:chararray);
grouped_region = group users by region;
inputs.
You can use this to receive the top x% for any given field and then you can
use that within a filter
On Mon, Mar 18, 2013 at 6:23 AM, Marco Cadetg ma...@zattoo.com wrote:
Hi there,
I would like to do something very similar to a nested foreach with using
order by and then limit
Hi there,
I do have some user session which look something on the following lines:
id:chararray, start:long(unix timestamp), end:long(unix timestamp)
xxx,1,3
xxx,4,7
yyy,1,2
yyy,5,7
zzz,6,7
zzz,7,10
I would like to to combine the rows which belong to a continues session
e.g. in my example the
Hi there,
What is the best way to retrieve duplicates from a bag. I basically would
like to do something like the opposite of DISTINCT.
A: {userid: long,foo: long,bar: long}
dump A
(1,2,3)
(1,2,3)
(1,3,2)
(2,3,1)
Now I would like to have a bag which contains
(1,2,3)
(1,2,3)
Thanks,
-Marco
= foreach D generate group;
Disclaimer: untested code.
Cheers,
--
Gianmarco
On Fri, Aug 24, 2012 at 11:35 AM, Marco Cadetg ma...@zattoo.com wrote:
Hi there,
What is the best way to retrieve duplicates from a bag. I basically would
like to do something like the opposite of DISTINCT
Hi there,
I'm doing a join like this:
A = LOAD '/data/sessions' USING PigStorage(',') AS
(userid:chararray, client_type:chararray, flag:long);
A1 = GROUP bettyy_sessions ALL;
A1 = FOREACH A1 GENERATE COUNT(A);
DUMP A1
(543872)
B = LOAD '/data/userdb' USING PigStorage(',') AS (uid:chararray,
Hi there,
I try to retrieve the group of 'rich' userids which are not 'happy' .
Something like retrieve all ids which are not in the other bags.ids.
Is there a better way to exclude some rows from a group?
Example code:
A: {userid: chararray,user_type: chararray}
A:
(1,rich)
(1,happy)
;
--jacob
@thedatachef
On Tue, 2012-02-28 at 16:49 +0100, Marco Cadetg wrote:
Hi there,
I try to retrieve the group of 'rich' userids which are not 'happy' .
Something like retrieve all ids which are not in the other bags.ids.
Is there a better way to exclude some rows from a group
Hi there,
AFAICT the STORE function doesn't provide a way to overwrite the output. I
guess you could use your own storage UDF to accomplish that but is there
also another way of doing that?
Thanks
-Marco
/visualization tool of choice...
Guy
On Mon, Oct 31, 2011 at 8:55 AM, Marco Cadetg ma...@zattoo.com wrote:
The data is not about students but about television ;) Regarding the
size.
The raw input data size is about 150m although when I 'explode' the
timeseries
it will be around
On Thu, Oct 27, 2011 at 4:05 PM, Guy Bayes fatal.er...@gmail.com
wrote:
how big is your dataset?
On Thu, Oct 27, 2011 at 9:23 AM, Marco Cadetg ma...@zattoo.com
wrote:
Thanks Bill and Norbert that seems like what I was looking for. I'm a
bit
worried about
how much data/io
I have a problem where I don't know how or if pig is even suitable to solve
it.
I have a schema like this:
student-id,student-name,start-time,duration,course
1,marco,1319708213,500,math
2,ralf,1319708111,112,english
3,greg,1319708321,333,french
4,diva,1319708444,80,english
27, 2011, Marco Cadetg ma...@zattoo.com wrote:
I have a problem where I don't know how or if pig is even suitable to
solve
it.
I have a schema like this:
student-id,student-name,start-time,duration,course
1,marco,1319708213,500,math
2,ralf,1319708111,112,english
3,greg
;
}
total_iq_per_gender = GROUP A BY (gender);
total_iq_per_gender = FOREACH A
{
GENERATE FLATTEN(group),
SUM(A.iq) AS iq_per_gender;
}
Now I guess I could use JOIN to combine both bags(?) by gender but somehow I
don't get it.
Thanks
-Marco
On Tue, Oct 11, 2011 at 6:02 PM, Marco Cadetg ma
FOREACH GENERATE needs to look like.
On Wed, Oct 12, 2011 at 10:34 AM, Dmitriy Ryaboy dvrya...@gmail.com wrote:
Sure, just join your total counts with your partials on gender.
D
On Tue, Oct 11, 2011 at 11:58 PM, Marco Cadetg ma...@zattoo.com wrote:
D'oh I just see that unfortunately my
:
(Male,Here,0.89285713)
(Male,There,0.10714286)
(Female,Here,0.13793103)
(Female,There,0.86206895)
Norbert
On Wed, Oct 12, 2011 at 5:38 AM, Marco Cadetg ma...@zattoo.com wrote:
Yes but I'm still not able to compute the percentage. I've joined the
bags
as below.
A = LOAD '/data/marco
Hi there,
I would like to replace the value of a field based on its value. E.g.:
A = LOAD 'student' USING PigStorage() AS (name:chararray);
DUMP A;
(John)
(Mary
(Bill)
(Joe)
(John)
Now I would like to replace all John's with Marco. Is there a way to do this
in PIG? I thought about using sth
' ? 'Marco' :
(name == 'Sally' ? 'Anne' : name)) as name;
Maybe there's a better way if you have to do lots of these at once..?
See also REPLACE for replacing a substring.
On 11 October 2011 08:51, Marco Cadetg ma...@zattoo.com wrote:
Hi there,
I would like to replace the value of a field
Hi there,
I would need to do something like this:
A = LOAD 'student' USING PigStorage() AS (name:chararray, region:chararry,
iq:int);
DUMP A;
(John, There, 10)
(Alf, There, 10)
(ET, There, 10)
(Mary, Here, 80)
(Bill, Here, 100)
(Joe, Here, 150)
total_iq_per_region = GROUP A BY (region);
Hi there,
I do have a problem with pig 9.0 and hadoop 0.20.204:
http://hadoop.apache.org/common/releases.html#5+Sep%2C+2011%3A+release+0.20.204.0+available
I tried several things but I am unable to use pig with that version of
hadoop. When using the pig without hadoop version build via ant
)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 26 more
On Tue, Sep 13, 2011 at 11:09 AM, Marco Cadetg ma
Sorry for spamming the list, it looks like there were some classpath jars
missing.
Cheers
-Marco
On Tue, Sep 13, 2011 at 3:13 PM, Marco Cadetg ma...@zattoo.com wrote:
I'm wondering if this is rather a hadoop configuration problem or a problem
between pig and hadoop?
Here is the complete
25 matches
Mail list logo