Re: python modules

2012-03-13 Thread Aniket Mokashi
Root cause for this problem was https://issues.apache.org/jira/browse/MAPREDUCE-967 backward incompatibility. Hadoop unpacks job.jar and puts jobCacheDir on classpath and hence jython is not able to find its jar (job.jar) on classpath (as its unpacked). With MAPREDUCE-967, hadoop puts job.jar on

Embedded Pig in a Python Module?

2012-03-13 Thread Eli Finkelshteyn
Hi Folks, I'm currently working on a framework that's going to do some awesome graphing stuff grabbing data out using Pig. What I'm wondering is, is there any way I can put embedded pig in a module and call it that way? Normally, I need to run embedded pig in a Python script as something like

Re: about distinct

2012-03-13 Thread guoyun
> On 3/5/12 7:19 PM, guoyun wrote: > > Dear All: > > this is the description of wiki about distinct: > > > > grunt> A = load 'mydata' using PigStorage() as (a, b, c); > > grunt>B = group A by a; > > grunt> C = foreach B { > > D = distinct A.b; > > generate

Re: Creating a relation on the fly

2012-03-13 Thread Dmitriy Ryaboy
Code samples help when debugging code :) On Tue, Mar 13, 2012 at 5:27 PM, rakesh sharma wrote: > > That is what I was doing and I got the following error: > 2012-03-13 20:38:14,404 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1000: Error during parsing. Encountered " "generate" "gener

RE: Creating a relation on the fly

2012-03-13 Thread rakesh sharma
That is what I was doing and I got the following error: 2012-03-13 20:38:14,404 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " "generate" "generate "" at line 5, column 7.Was expecting one of:"define" ..."load" ..."filter" ... "for

Re: Creating a relation on the fly

2012-03-13 Thread Bill Graham
Are you doing this verbatim: window = generate UUIDGenerator() as uuid, '$p1' as value1, '$p2' as value2; If so that's an invalid statement. GENERATE works with FOREACH. What is you exact pig script and error? On Tue, Mar 13, 2012 at 4:51 PM, rakesh sharma wrote: > > Parameter substitution wor

RE: Creating a relation on the fly

2012-03-13 Thread rakesh sharma
Parameter substitution works fine. I am having issues where I want to tie all three values UDF generated and parameters into a relation. > From: billgra...@gmail.com > Date: Tue, 13 Mar 2012 16:43:25 -0700 > Subject: Re: Creating a relation on the fly > To: user@pig.apache.org > > Are p1 and p2

Re: Creating a relation on the fly

2012-03-13 Thread Bill Graham
Are p1 and p2 static params for the life of the script or are they dynamic? You could do something like this if it's the former: pig -p p1=foo -p p2=bar -f script.pig and they'd be properly inserted into $p1 and $p2 in your script. On Tue, Mar 13, 2012 at 2:37 PM, rakesh sharma wrote: > > Hi A

Creating a relation on the fly

2012-03-13 Thread rakesh sharma
Hi All, I have a situation where I need to create a relation by a combination of UDF and parameter values. For example, first field will be generated by UDF UUIDGenerator, second field by parameter p1, and third field by parameter p2. I am looking some way of having a relation window as express

Re: config/reference data files for UDFS

2012-03-13 Thread Alan Gates
Sorry, copy paste error. Let's try that again: http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/builtin/Bloom.java?view=markup Alan. On Mar 13, 2012, at 1:55 PM, Stan Rosenberg wrote: > Hi Alan, > > I am also curious to see how the distributed cache is used in a UDF. > However, the c

Re: config/reference data files for UDFS

2012-03-13 Thread Stan Rosenberg
Hi Alan, I am also curious to see how the distributed cache is used in a UDF. However, the code you reference in the patch doesn't appear to contain such an example. What is the name of source file? Thanks, stan On Mon, Mar 12, 2012 at 7:24 PM, Alan Gates wrote: > Take a look at the builtin U

Re: Accumulator is not fired

2012-03-13 Thread Yen SYU
Hi Jon, Thanks for your reponse! I use pig 0.9.1-snapshot. I've used FLATTEN instead of $0 and $1, but ACCUM_CALL is still not fired. Also tried to remove generic type in accumulator but it did not help. :( Is it easy for you to fire accumulator? Yen On Tue, Mar 13, 2012 at 3:06 PM, Jonathan C

Re: Accumulator is not fired

2012-03-13 Thread Jonathan Coveney
What version of pig are you using? just as an experiment in the simple case, can you try doing GENERATE flatten(group) as (domain,host), ...(the rest)... shouldn't make a difference, but I think I remember that in some older versions it did 2012/3/13 Yen SYU > Hi all, > > I just test a very s

Accumulator is not fired

2012-03-13 Thread Yen SYU
Hi all, I just test a very simple pig script as following: records = LOAD '$input' AS (hash:chararray, domain:chararray, host:chararray, page:chararray, freq:int); grpd = GROUP records BY (domain, host); stats = FOREACH grpd { hashes = records.hash;

Re: "Non-linear" data flow split with 1 leg

2012-03-13 Thread Alan Gates
This looks like a parser bug. Alan. On Mar 13, 2012, at 11:32 AM, Charles Menguy wrote: > Hi all, > > I have a question about PIG regarding non-linear data flows. > > I'm using the SPLIT command to be able to do different behavior based on my > data, but I noticed something unexpected. > >

"Non-linear" data flow split with 1 leg

2012-03-13 Thread Charles Menguy
Hi all, I have a question about PIG regarding non-linear data flows. I'm using the SPLIT command to be able to do different behavior based on my data, but I noticed something unexpected. When I do a SPLIT with only 1 leg, for some reason that doesn't work, as it seems to be expecting at least a

Re: Can I reduce shuffling time?

2012-03-13 Thread Prashant Kommireddi
What is the number of reduce shuffle bytes for this job? Also, is this job CPU intensive on reducers or is it simple aggregation? Sent from my iPhone On Mar 13, 2012, at 5:25 AM, Austin Chungath wrote: > Hi, > I am running a pig query on around 500 GB input data. > The current block size is 128

Re: Stripping Key

2012-03-13 Thread Sisso
You can: 1. load all data, 2. use strsplit (http://pig.apache.org/docs/r0.9.2/func.html#strsplit) to split your values into a tuple 3. convert your tuples into a bag (I used an UDF in python instead DF tobag ) 4. flatten your bag (http://pig.apache.org/docs/r0.9.2/basic.html#flatten) I don't know

Re: Problem running Pig script with Jython

2012-03-13 Thread Juan Martin Pampliega
Hi, Sory for being repetitive but I just wanted to know if anyone had any idea about how to get around this issue. On Sat, Mar 3, 2012 at 8:11 PM, Juan Martin Pampliega wrote: > Hi, > > I'm currently trying to run the following Jython script: > > #!/usr/bin/python > > from org.apache.pig.scripti

Can I reduce shuffling time?

2012-03-13 Thread Austin Chungath
Hi, I am running a pig query on around 500 GB input data. The current block size is 128 MB and split size is the default 128 MB. I have also specified 16 reducers and around 3800 mappers are running. Now I observe that shuffling is taking a long time to complete execution, approximately 25 mins pe