Re: Counting elements for each group

2014-07-29 Thread Arian Pasquali
Thanks Gianmarco!! here the final version is like this by_clusters = GROUP sample_data by (cluster_id, terms); by_clusters_terms_count = FOREACH by_clusters GENERATE FLATTEN(group) as (cluster_id, terms), COUNT($1); cheers Arian Pasquali http://about.me/arianpasquali 2014-07-29 13:23 GMT+01

Counting elements for each group

2014-07-29 Thread Arian Pasquali
Hi, I'm having trouble with a simple task that I believe someone out there must have already solved some day. I'm trying to group and count the frequency of terms for each group in PigLatin, but I'm having some troubles to figure it out how to do it. I have a collection of objects with the follo

Re: How do I load JSON in Pig?

2012-11-18 Thread Arian Pasquali
I dont think you really need to build it. you can find it at any maven repository. Arian Rodrigo Pasquali FEUP, SAPO Labs http://www.arianpasquali.com twitter @arianpasquali 2012/11/18 Arian Pasquali > U dont need to build neither > Just download those two jar I used in my example. >

Re: How do I load JSON in Pig?

2012-11-18 Thread Arian Pasquali
t; Russell Jurney http://datasyndrome.com > > On Nov 17, 2012, at 9:30 PM, Arian Pasquali > > > wrote: > > > keep calm > > and use elephant-bird > > https://github.com/kevinweil/elephant-bird< > https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/tw

Re: How do I load JSON in Pig?

2012-11-18 Thread Arian Pasquali
t; Russell Jurney http://datasyndrome.com > > On Nov 17, 2012, at 9:30 PM, Arian Pasquali > > > wrote: > > > keep calm > > and use elephant-bird > > https://github.com/kevinweil/elephant-bird< > https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/tw

Re: How do I load JSON in Pig?

2012-11-17 Thread Arian Pasquali
keep calm and use elephant-bird https://github.com/kevinweil/elephant-bird I posted here yesterday an example how to load tweets in json here goes again. I hope it helps.

Re: problem filtering null values with pig

2012-11-16 Thread Arian Pasquali
eets_grp { first_tweet = LIMIT inpt 1; GENERATE FLATTEN(first_tweet); }; only_not_nulls = FILTER geo_tweets BY geoLocation is not null; store only_not_nulls into '/twitter_data/results/geo_tweets'; cheers thanks again for your support Arian P 2012/1

Re: problem filtering null values with pig

2012-11-01 Thread Arian Pasquali
lly, you should be able to write a UDF that checks type. But I am > more interested in knowing why you're running into this problem. Can you > please share your script and sample data? I'd like to reproduce it. > > Thanks, > Cheolsoo > > On Wed, Oct 31, 2012 at 2

Re: problem filtering null values with pig

2012-10-31 Thread Arian Pasquali
can create an expression to compare datatypes? is it possible? ArianP 2012/10/31 Arian Pasquali > you are right, it doesn't seam like a null value. > it looks like a chararray. But the expression causes error when comparing > a string with ([longitude#-9.15199849,latitu

Re: problem filtering null values with pig

2012-10-31 Thread Arian Pasquali
#x27;s the problem because I can't reproduce it. To me, null > values are printed as an empty "( )" not "(null)", so it doesn't seem like > null. > > I am wondering whether OpenJDK is the problem. Can you try Oracle HotSpot > JDK 1.6 and see that fixes

problem filtering null values with pig

2012-10-31 Thread Arian Pasquali
hey people I'm having some troubles with a silly task, I canĀ“t find a way to filter null values from my rows. This is the result when I dump the object geoinfo: DUMP geoinfo; ([longitude#70.95853,latitude#30.9773]) ([longitude#-9.37944507,latitude#38.91780853]) (null) (null) (null) ([longitude#-92