Frequency count in pig

2014-05-16 Thread jamal sasha
Hi, My data is in format: user_id,movie_id,timestamp 123, abc,unix_timestamp 123, def, ... 123, abc, ... 234, sda, ... Now, I want to compute the number of times each movie is played in pig.. So the output I am expecting is: 123,abc,2 123,def,1 234,sda,1 and

Simple word count in pig..

2013-11-19 Thread jamal sasha
Hi, I have data already processed in following form: ( id ,{ bag of words}) So for example: (foobar, {(foo), (foo),(foobar),(bar)}) (foo,{(bar),(bar)}) and so on.. describe processed gives me: processed: {id: chararray,tokens: {tuple_of_tokens: (token: chararray)}} Now what I want is.. also

simple pig logic

2013-10-31 Thread jamal sasha
Hi, I have two datasets.. main_data.txt {id:foo, some_field:12354, score:0} {id:foobar, some_field:12354, score:0} score_data.txt {id:foo, score:1} {id:foobar,score:20} So in main_data.. score is initialized to 0.. Also.. main_data and score_data have some ids in common.. For the ids

Parsing flexing json in pig

2013-10-07 Thread jamal sasha
Hi, I have a semi-structured json: For example: {id:1,name:foo} {id:1,name:foo,address:foobar} {id:1,name:foo,address:foobar,phone:[123,133} {id:2,name:foobar,address:foobar} And so on. So, what I want to do is , read this file and select id and count address for each id If address field is

Accessig paritcular folder

2013-10-02 Thread jamal sasha
Hi, I have data in this one folder like following: data---shard1---d1_1 | |_d2_1 Lshard2---d1_1 | |_d2_2 Lshard3---d1_1 | |_d2_3 Lshard4---d1_1 |_d2_4 Now, I want to

Reading simple json file

2013-09-23 Thread jamal sasha
Hi, I am trying to read simple json data as: d =LOAD 'json_output' USING JSONLOADER(('ip:chararray,_id:chararray,cats:[chararray]'); But I am getting this error: 2013-09-23 14:33:17,127 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve JSONLOADER using imports: [,

Re: Reading simple json file

2013-09-23 Thread jamal sasha
never mind :D On Mon, Sep 23, 2013 at 2:37 PM, jamal sasha jamalsha...@gmail.com wrote: Hi, I am trying to read simple json data as: d =LOAD 'json_output' USING JSONLOADER(('ip:chararray,_id:chararray,cats:[chararray]'); But I am getting this error: 2013-09-23 14:33:17,127 [main] ERROR

Re: Converting xml to csv

2013-09-12 Thread jamal sasha
:35 PM, ajay kumar ajaysanagap...@gmail.comwrote: use org.apache.pig.piggybank.storage.XMLLoader and then extract them using regex_all On Thu, Sep 12, 2013 at 11:18 AM, jamal sasha jamalsha...@gmail.com wrote: Umm.. yess.. but how do i generalize it.. so what I am looking for is.. just

Re: Converting xml to csv

2013-09-11 Thread jamal sasha
flatten the xml.. so for example convert aux foobar1/foobar fushbarfoo/fushbar /aux to auxfoobar1/foobarfushbarfoo/fushbar/aux ??? On Wed, Sep 11, 2013 at 10:32 PM, Jagat Singh jagatsi...@gmail.com wrote: Use piggybank xmlloader On 12/09/2013 10:14 AM, jamal sasha jamalsha...@gmail.com wrote

Re: Converting xml to csv

2013-09-11 Thread jamal sasha
flatten the xml.. so for example convert aux foobar1/foobar fushbarfoo/fushbar /aux to auxfoobar1/foobarfushbarfoo/fushbar/aux ??? On Wed, Sep 11, 2013 at 10:32 PM, Jagat Singh jagatsi...@gmail.com wrote: Use piggybank xmlloader On 12/09/2013 10:14 AM, jamal sasha jamalsha...@gmail.com wrote

Reading json file.

2013-08-29 Thread jamal sasha
Hi, I have json file in follwoing format: { _id : foo.com, categories : [], h1 : { bar== : { first : 1281916800, last : 1316995200 }, foo== : { first : 1281916800, last : 1316995200 } }, name2 : [ foobarl.com, foobar2.com ], rep : null } So, how do i parse this json in pig.. also, the categories

Re: Reading json file.

2013-08-29 Thread jamal sasha
/JsonStorage.html http://hortonworks.com/blog/jsonize-anything-in-pig-with-tojson/ Regards, Shahab On Thu, Aug 29, 2013 at 6:19 PM, jamal sasha jamalsha...@gmail.com wrote: Hi, I have json file in follwoing format: { _id : foo.com, categories : [], h1 : { bar== : { first : 1281916800

pig question

2013-04-27 Thread jamal sasha
Hi, I have data of format id1,id2, value 1 , abc, 2993 1, dhu, 9284 1,dus,2389 2, acs,29392 and so on For each id1, I want to find the maximum value and then divide value by max_value so in example above: 1,abc, 2993/9284 1,dhu ,9284/9284 1,dus, 2389/9284 2,acs, 29392/max_value_for_this id

count duplicate entries

2013-04-02 Thread jamal sasha
Hi, I have data in hdfs like: id1,field1,field2 1,2,3 1,2,3 1,2,4 1,2,5 I want to find the number of unique entries using pig.. So here, number of unique entries are 3 ( as 1,2,3 is repeated twice) How do i find this? Thanks

Join question

2013-04-01 Thread jamal sasha
Hi, I have a simple join question. base = load 'input1' USING PigStorage( ',' ) as (id1, field1, field2); stats = load 'input2' USING PigStorage(',') as (id1, mean, median); joined = JOIN base BY id1, stats BY id1; final = FOREACH joined GENERATE base::id1, base::field1,base::field2,

Re: Join question

2013-04-01 Thread jamal sasha
i achieve this. Thanks On Mon, Apr 1, 2013 at 2:24 PM, Mehmet Tepedelenlioglu mehmets...@yahoo.com wrote: Are your ids unique? On 4/1/13 2:06 PM, jamal sasha jamalsha...@gmail.com wrote: Hi, I have a simple join question. base = load 'input1' USING PigStorage( ',' ) as (id1

ignoring null entries

2013-03-29 Thread jamal sasha
Hi, I have data as : id1:string, value1:string Sometimes id is missing so the data looks like: foo,foobar ,foo1 foobar,bar1 , I want to remove missing values So the output should be foo,foobar foobar,bar1 How can I achieve this in pig (without using udf??)

Fwd: error

2012-12-12 Thread jamal sasha
Eh sorry nf0 = foreach gruped generate features.id,features.f0/mf0; Should be nf0 = foreach gruped generate data.id,data.f0/mf0; -- Forwarded message -- From: jamal sasha Date: Wednesday, December 12, 2012 Subject: error To: user@pig.apache.org user@pig.apache.org mf0 = LOAD

Re: need help about pig script on this case

2012-11-19 Thread jamal sasha
On a different context, I was once stuck with the same problem but was able to navigate this using bincond operator. http://ofps.oreilly.com/titles/9781449302641/intro_pig_latin.html Not sure, how you would hack in here.. but i have a feeling it can be pulled off. On Mon, Nov 19, 2012 at 8:49

Re: Better formated.. Pig udf help

2012-11-12 Thread jamal sasha
, jamal sasha jamalsha...@gmail.com wrote: Hi Great catch Now I get an error Cannot find hadoop configuration in class path ( neither hadoop site XML etc) So I am running the file on a cluster which had say hadoop set up as /path/hadoop /path/pig And I have account in it So I cannot

computing avg in pig

2012-11-06 Thread jamal sasha
I have data in format 1,1.2 2,1.3 and so on.. So basically this is id, val combination where id is unique... I want to calculate the average of all the values.. So here.. avg(1.2,1.3) I was going thru the documentation but most of the aggregation function

Re: Better formated.. Pig udf help

2012-10-26 Thread jamal sasha
Hi In this case I get an error Problem resolving class version numbers for class myudfs.time?? On Thursday, October 25, 2012, pablomar pablo.daniel.marti...@gmail.com wrote: to run your script you have to do pig -f time.pig On Thu, Oct 25, 2012 at 5:46 PM, jamal sasha jamalsha...@gmail.com

Re: Better formated.. Pig udf help

2012-10-26 Thread jamal sasha
by: java.lang.ClassNotFoundException: org.pache.pig.Main at java.net.URLClassLoader$1.run(URLClassLoader.java:202) Note that the 'a' in apache is missing. On Thu, Oct 25, 2012 at 2:46 PM, jamal sasha jamalsha...@gmail.com wrote: Hi, I am trying to write a pig udf function.. Basically

Pig udf help

2012-10-25 Thread jamal sasha
Hi, I am trying to write a pig udf function.. Basically the data is of format Id,time What I am trying to do is … parse the time and then see whether its breakfast, lunch or dinner.. based on the time stamp. Some entries wil be null as well.. So here is the udf code for this.

Better formated.. Pig udf help

2012-10-25 Thread jamal sasha
Hi, I am trying to write a pig udf function.. Basically the data is of format* *** ** ** Id,time What I am trying to do is … parse the time and then see whether its breakfast, lunch or dinner.. based on the time stamp. Some entries wil be null as well.. ** ** So here is the

UDF function help

2012-10-24 Thread jamal sasha
I am trying to learn both java and pig programming.. So basically.. not an ideal combination but things are looking good.. but I am not able to solve this out.. In my local environment I dont have pig libraries... but on the cluster... YES! So.. when I do import

Re: matrix multiplication

2012-10-22 Thread jamal sasha
, Gunther. On Sun, Oct 21, 2012 at 7:40 PM, jamal sasha jamalsha...@gmail.com wrote: Hi, I am trying to do matrix multiplication using pig. Basically I have data in the form: data1.txt item1,item2,0.3 item1, item3, 0.4 item1, item5, 0.6 And then I another data in the form data2.txt

Re: matrix multiplication

2012-10-22 Thread jamal sasha
wrote: That's fairly straightforward. Take a look at: http://pig.apache.org/docs/r0.10.0/basic.html (order by, limit). Thanks, Gunther. On Mon, Oct 22, 2012 at 7:12 AM, jamal sasha jamalsha...@gmail.com wrote: Hi Great . Thanks alot. How do I sort the result by score and select top 20

question

2012-10-11 Thread jamal sasha
I have a data file in format User, movie, price 123,abc,22.2 123,daw,39 123,abc,99 ß Note that the user and movie is same but price is different I want to generate a pig script where I am counting how many times a user has rented a particular movie in = LOAD 'data' USING

Pig question.

2012-10-03 Thread jamal sasha
Hi, I have a table in format: Id: int, amount: float, true_date: chararray, time:chararray, state:chararray Fortunately, there are only two states in my db. So if I have a state as “CA” then add +1 to datetime If state is “MA”, then add +5 to datetime And then save the results. Also a

finding mean and standard deviation

2012-09-25 Thread jamal sasha
Hi, I have a huge text file of form data is saved in directory data/data1.txt, data2.txt and so on merchant_id, user_id, amount 1234, 9123, 299.2 1233, 9199, 203.2 1234, 0124, 230 and so on.. What I want to do is for each merchant, find the average amount.. so basically in the end i

Re: finding mean and standard deviation

2012-09-25 Thread jamal sasha
to see how it compute the average. Basically, you need to modify the exec() method to compute standard deviation instead of average. Thanks, Cheolsoo On Tue, Sep 25, 2012 at 6:36 PM, jamal sasha jamalsha...@gmail.com wrote: Hi, I have a huge text file of form data is saved