Root cause for this problem was
https://issues.apache.org/jira/browse/MAPREDUCE-967 backward
incompatibility.
Hadoop unpacks job.jar and puts jobCacheDir on classpath and hence jython
is not able to find its jar (job.jar) on classpath (as its unpacked).
With MAPREDUCE-967, hadoop puts job.jar on
Hi Folks,
I'm currently working on a framework that's going to do some awesome
graphing stuff grabbing data out using Pig. What I'm wondering is, is
there any way I can put embedded pig in a module and call it that way?
Normally, I need to run embedded pig in a Python script as something
like
> On 3/5/12 7:19 PM, guoyun wrote:
> > Dear All:
> > this is the description of wiki about distinct:
> >
> > grunt> A = load 'mydata' using PigStorage() as (a, b, c);
> > grunt>B = group A by a;
> > grunt> C = foreach B {
> > D = distinct A.b;
> > generate
Code samples help when debugging code :)
On Tue, Mar 13, 2012 at 5:27 PM, rakesh sharma
wrote:
>
> That is what I was doing and I got the following error:
> 2012-03-13 20:38:14,404 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
> 1000: Error during parsing. Encountered " "generate" "gener
That is what I was doing and I got the following error:
2012-03-13 20:38:14,404 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
1000: Error during parsing. Encountered " "generate" "generate "" at line 5,
column 7.Was expecting one of:"define" ..."load" ..."filter" ...
"for
Are you doing this verbatim:
window = generate UUIDGenerator() as uuid, '$p1' as value1, '$p2' as value2;
If so that's an invalid statement. GENERATE works with FOREACH.
What is you exact pig script and error?
On Tue, Mar 13, 2012 at 4:51 PM, rakesh sharma
wrote:
>
> Parameter substitution wor
Parameter substitution works fine. I am having issues where I want to tie all
three values UDF generated and parameters into a relation.
> From: billgra...@gmail.com
> Date: Tue, 13 Mar 2012 16:43:25 -0700
> Subject: Re: Creating a relation on the fly
> To: user@pig.apache.org
>
> Are p1 and p2
Are p1 and p2 static params for the life of the script or are they dynamic?
You could do something like this if it's the former:
pig -p p1=foo -p p2=bar -f script.pig
and they'd be properly inserted into $p1 and $p2 in your script.
On Tue, Mar 13, 2012 at 2:37 PM, rakesh sharma
wrote:
>
> Hi A
Hi All,
I have a situation where I need to create a relation by a combination of UDF
and parameter values. For example, first field will be generated by UDF
UUIDGenerator, second field by parameter p1, and third field by parameter p2. I
am looking some way of having a relation window as express
Sorry, copy paste error. Let's try that again:
http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/builtin/Bloom.java?view=markup
Alan.
On Mar 13, 2012, at 1:55 PM, Stan Rosenberg wrote:
> Hi Alan,
>
> I am also curious to see how the distributed cache is used in a UDF.
> However, the c
Hi Alan,
I am also curious to see how the distributed cache is used in a UDF.
However, the code you reference in the patch doesn't appear to contain
such an example. What is the name of source file?
Thanks,
stan
On Mon, Mar 12, 2012 at 7:24 PM, Alan Gates wrote:
> Take a look at the builtin U
Hi Jon,
Thanks for your reponse! I use pig 0.9.1-snapshot.
I've used FLATTEN instead of $0 and $1, but ACCUM_CALL is still not fired.
Also tried to remove generic type in accumulator but it did not help. :(
Is it easy for you to fire accumulator?
Yen
On Tue, Mar 13, 2012 at 3:06 PM, Jonathan C
What version of pig are you using?
just as an experiment in the simple case, can you try doing
GENERATE flatten(group) as (domain,host), ...(the rest)...
shouldn't make a difference, but I think I remember that in some older
versions it did
2012/3/13 Yen SYU
> Hi all,
>
> I just test a very s
Hi all,
I just test a very simple pig script as following:
records = LOAD '$input' AS (hash:chararray, domain:chararray,
host:chararray, page:chararray, freq:int);
grpd = GROUP records BY (domain, host);
stats = FOREACH grpd {
hashes = records.hash;
This looks like a parser bug.
Alan.
On Mar 13, 2012, at 11:32 AM, Charles Menguy wrote:
> Hi all,
>
> I have a question about PIG regarding non-linear data flows.
>
> I'm using the SPLIT command to be able to do different behavior based on my
> data, but I noticed something unexpected.
>
>
Hi all,
I have a question about PIG regarding non-linear data flows.
I'm using the SPLIT command to be able to do different behavior based on my
data, but I noticed something unexpected.
When I do a SPLIT with only 1 leg, for some reason that doesn't work, as it
seems to be expecting at least a
What is the number of reduce shuffle bytes for this job? Also, is this
job CPU intensive on reducers or is it simple aggregation?
Sent from my iPhone
On Mar 13, 2012, at 5:25 AM, Austin Chungath wrote:
> Hi,
> I am running a pig query on around 500 GB input data.
> The current block size is 128
You can:
1. load all data,
2. use strsplit (http://pig.apache.org/docs/r0.9.2/func.html#strsplit) to
split your values into a tuple
3. convert your tuples into a bag (I used an UDF in python instead DF tobag
)
4. flatten your bag (http://pig.apache.org/docs/r0.9.2/basic.html#flatten)
I don't know
Hi,
Sory for being repetitive but I just wanted to know if anyone had any idea
about how to get around this issue.
On Sat, Mar 3, 2012 at 8:11 PM, Juan Martin Pampliega
wrote:
> Hi,
>
> I'm currently trying to run the following Jython script:
>
> #!/usr/bin/python
>
> from org.apache.pig.scripti
Hi,
I am running a pig query on around 500 GB input data.
The current block size is 128 MB and split size is the default 128 MB.
I have also specified 16 reducers and around 3800 mappers are running.
Now I observe that shuffling is taking a long time to complete execution,
approximately 25 mins pe
20 matches
Mail list logo