Hi,
My data is in format:
user_id,movie_id,timestamp
123, abc,unix_timestamp
123, def, ...
123, abc, ...
234, sda, ...
Now, I want to compute the number of times each movie is played in pig..
So the output I am expecting is:
123,abc,2
123,def,1
234,sda,1
and
Hi,
I have data already processed in following form:
( id ,{ bag of words})
So for example:
(foobar, {(foo), (foo),(foobar),(bar)})
(foo,{(bar),(bar)})
and so on..
describe processed gives me:
processed: {id: chararray,tokens: {tuple_of_tokens: (token: chararray)}}
Now what I want is.. also
Hi,
I have two datasets..
main_data.txt
{id:foo, some_field:12354, score:0}
{id:foobar, some_field:12354, score:0}
score_data.txt
{id:foo, score:1}
{id:foobar,score:20}
So in main_data.. score is initialized to 0..
Also.. main_data and score_data have some ids in common..
For the ids
Hi,
I have a semi-structured json:
For example:
{id:1,name:foo}
{id:1,name:foo,address:foobar}
{id:1,name:foo,address:foobar,phone:[123,133}
{id:2,name:foobar,address:foobar}
And so on.
So, what I want to do is , read this file
and select id and count address for each id
If address field is
Hi,
I have data in this one folder like following:
data---shard1---d1_1
| |_d2_1
Lshard2---d1_1
| |_d2_2
Lshard3---d1_1
| |_d2_3
Lshard4---d1_1
|_d2_4
Now, I want to
Hi,
I am trying to read simple json data as:
d =LOAD 'json_output' USING
JSONLOADER(('ip:chararray,_id:chararray,cats:[chararray]');
But I am getting this error:
2013-09-23 14:33:17,127 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1070: Could not resolve JSONLOADER using imports: [,
never mind :D
On Mon, Sep 23, 2013 at 2:37 PM, jamal sasha jamalsha...@gmail.com wrote:
Hi,
I am trying to read simple json data as:
d =LOAD 'json_output' USING
JSONLOADER(('ip:chararray,_id:chararray,cats:[chararray]');
But I am getting this error:
2013-09-23 14:33:17,127 [main] ERROR
:35 PM, ajay kumar ajaysanagap...@gmail.comwrote:
use org.apache.pig.piggybank.storage.XMLLoader and then extract them using
regex_all
On Thu, Sep 12, 2013 at 11:18 AM, jamal sasha jamalsha...@gmail.com
wrote:
Umm.. yess.. but how do i generalize it..
so what I am looking for is.. just
flatten the xml..
so for example
convert
aux
foobar1/foobar
fushbarfoo/fushbar
/aux
to
auxfoobar1/foobarfushbarfoo/fushbar/aux
???
On Wed, Sep 11, 2013 at 10:32 PM, Jagat Singh jagatsi...@gmail.com wrote:
Use piggybank xmlloader
On 12/09/2013 10:14 AM, jamal sasha jamalsha...@gmail.com wrote
flatten the xml..
so for example
convert
aux
foobar1/foobar
fushbarfoo/fushbar
/aux
to
auxfoobar1/foobarfushbarfoo/fushbar/aux
???
On Wed, Sep 11, 2013 at 10:32 PM, Jagat Singh jagatsi...@gmail.com wrote:
Use piggybank xmlloader
On 12/09/2013 10:14 AM, jamal sasha jamalsha...@gmail.com wrote
Hi,
I have json file in follwoing format:
{ _id : foo.com, categories : [], h1 : { bar== : { first :
1281916800, last : 1316995200 }, foo== : { first : 1281916800, last
: 1316995200 } }, name2 : [ foobarl.com, foobar2.com ], rep : null }
So, how do i parse this json in pig..
also, the categories
/JsonStorage.html
http://hortonworks.com/blog/jsonize-anything-in-pig-with-tojson/
Regards,
Shahab
On Thu, Aug 29, 2013 at 6:19 PM, jamal sasha jamalsha...@gmail.com
wrote:
Hi,
I have json file in follwoing format:
{ _id : foo.com, categories : [], h1 : { bar== : { first :
1281916800
Hi,
I have data of format
id1,id2, value
1 , abc, 2993
1, dhu, 9284
1,dus,2389
2, acs,29392
and so on
For each id1, I want to find the maximum value and then divide value by
max_value
so in example above:
1,abc, 2993/9284
1,dhu ,9284/9284
1,dus, 2389/9284
2,acs, 29392/max_value_for_this id
Hi,
I have data in hdfs like:
id1,field1,field2
1,2,3
1,2,3
1,2,4
1,2,5
I want to find the number of unique entries using pig..
So here, number of unique entries are 3 ( as 1,2,3 is repeated twice)
How do i find this?
Thanks
Hi,
I have a simple join question.
base = load 'input1' USING PigStorage( ',' ) as (id1, field1, field2);
stats = load 'input2' USING PigStorage(',') as (id1, mean, median);
joined = JOIN base BY id1, stats BY id1;
final = FOREACH joined GENERATE base::id1, base::field1,base::field2,
i achieve this.
Thanks
On Mon, Apr 1, 2013 at 2:24 PM, Mehmet Tepedelenlioglu mehmets...@yahoo.com
wrote:
Are your ids unique?
On 4/1/13 2:06 PM, jamal sasha jamalsha...@gmail.com wrote:
Hi,
I have a simple join question.
base = load 'input1' USING PigStorage( ',' ) as (id1
Hi,
I have data as :
id1:string, value1:string
Sometimes id is missing so the data looks like:
foo,foobar
,foo1
foobar,bar1
,
I want to remove missing values
So the output should be
foo,foobar
foobar,bar1
How can I achieve this in pig (without using udf??)
Eh sorry
nf0 = foreach gruped generate features.id,features.f0/mf0;
Should be
nf0 = foreach gruped generate data.id,data.f0/mf0;
-- Forwarded message --
From: jamal sasha
Date: Wednesday, December 12, 2012
Subject: error
To: user@pig.apache.org user@pig.apache.org
mf0 = LOAD
On a different context, I was once stuck with the same problem but was able
to navigate this using bincond operator.
http://ofps.oreilly.com/titles/9781449302641/intro_pig_latin.html
Not sure, how you would hack in here.. but i have a feeling it can be
pulled off.
On Mon, Nov 19, 2012 at 8:49
, jamal sasha jamalsha...@gmail.com wrote:
Hi
Great catch
Now I get an error
Cannot find hadoop configuration in class path ( neither hadoop site XML
etc)
So I am running the file on a cluster which had say hadoop set up as
/path/hadoop
/path/pig
And I have account in it
So I cannot
I have data in format
1,1.2
2,1.3
and so on..
So basically this is id, val combination where id is unique...
I want to calculate the average of all the values..
So here.. avg(1.2,1.3)
I was going thru the documentation but most of the aggregation function
Hi
In this case I get an error
Problem resolving class version numbers for class myudfs.time??
On Thursday, October 25, 2012, pablomar pablo.daniel.marti...@gmail.com
wrote:
to run your script you have to do
pig -f time.pig
On Thu, Oct 25, 2012 at 5:46 PM, jamal sasha jamalsha...@gmail.com
by: java.lang.ClassNotFoundException: org.pache.pig.Main
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
Note that the 'a' in apache is missing.
On Thu, Oct 25, 2012 at 2:46 PM, jamal sasha jamalsha...@gmail.com
wrote:
Hi,
I am trying to write a pig udf function.. Basically
Hi,
I am trying to write a pig udf function.. Basically the data is of
format
Id,time
What I am trying to do is … parse the time and then see whether its
breakfast, lunch or dinner.. based on the time stamp. Some entries wil be
null as well..
So here is the udf code for this.
Hi,
I am trying to write a pig udf function.. Basically the data is of format*
***
** **
Id,time
What I am trying to do is … parse the time and then see whether its
breakfast, lunch or dinner.. based on the time stamp. Some entries wil be
null as well..
** **
So here is the
I am trying to learn both java and pig programming.. So basically.. not
an ideal combination but things are looking good.. but I am not able to
solve this out..
In my local environment I dont have pig libraries... but on the
cluster... YES!
So.. when I do
import
,
Gunther.
On Sun, Oct 21, 2012 at 7:40 PM, jamal sasha jamalsha...@gmail.com
wrote:
Hi,
I am trying to do matrix multiplication using pig.
Basically I have data in the form:
data1.txt
item1,item2,0.3
item1, item3, 0.4
item1, item5, 0.6
And then I another data in the form
data2.txt
wrote:
That's fairly straightforward. Take a look at:
http://pig.apache.org/docs/r0.10.0/basic.html (order by, limit).
Thanks,
Gunther.
On Mon, Oct 22, 2012 at 7:12 AM, jamal sasha jamalsha...@gmail.com
wrote:
Hi
Great . Thanks alot.
How do I sort the result by score and select top 20
I have a data file in format
User, movie, price
123,abc,22.2
123,daw,39
123,abc,99 ß Note that the user and movie is same but price is different
I want to generate a pig script where I am counting how many times a user
has rented a particular movie
in = LOAD 'data' USING
Hi,
I have a table in format:
Id: int, amount: float, true_date: chararray, time:chararray,
state:chararray
Fortunately, there are only two states in my db.
So if I have a state as “CA” then add +1 to datetime
If state is “MA”, then add +5 to datetime
And then save the results.
Also a
Hi,
I have a huge text file of form
data is saved in directory data/data1.txt, data2.txt and so on
merchant_id, user_id, amount
1234, 9123, 299.2
1233, 9199, 203.2
1234, 0124, 230
and so on..
What I want to do is for each merchant, find the average amount..
so basically in the end i
to see how it compute the average.
Basically, you need to modify the exec() method to compute standard
deviation instead of average.
Thanks,
Cheolsoo
On Tue, Sep 25, 2012 at 6:36 PM, jamal sasha jamalsha...@gmail.com
wrote:
Hi,
I have a huge text file of form
data is saved
32 matches
Mail list logo