Hi, is there any possibility to dump all MR counters to console when pig
script is finished?
I also work with Scalding, they have cool feature. You can implement flow
listener (flow ifs a kind of Pig execution DAG). And listen to "flow
finished" event. Then you can get access to FlowStats object
Print jobStats.hadoopCounters: ")
jobStats.hadoopCounters.asList().forEach{it ->
println("hadoopCounters element: $it")
}
}
I don't see anything written to console... What do I do wrong?
2015-12-14 10:53 GMT+01:00 Serega Sheypak <serega.shey...@gmail
no idea, need:
1. give script
2. give full trace
2015-02-21 6:53 GMT+03:00 pruthvi D pruthvi2...@gmail.com:
hello,
i am getting this error after i execute my pig udf(python).
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open
iterator for alias d
can you please let me
You can apply ternary operator or SPLIT function
2015-01-20 11:28 GMT+03:00 patcharee patcharee.thong...@uni.no:
Hi,
I know that there is no if-else in pig, but is it possible to implement
this logic in some way if a input parameter is null, skip this filter
operation ?
IF ($MONTH !=
Do you use maven? It should work out of the box. Your UDF is in maven
classpath
2014-11-15 2:22 GMT+03:00 Eric Blood winkywoos...@gmail.com:
I'm trying to write tests with PigUnit that get built as part of the maven
build process. How do I do a 'register' for a jar that isn't built yet?
I've
JOIN (outer)
Performs an outer join of *two* relations based on common field values.
2014-11-05 13:22 GMT+03:00 Mohan msmoha...@gmail.com:
Hi There,
I have Three Datasets as below -
PROV:
(01)
(02)
(04)
(05)
(06)
(08)
(09)
(10)
(11)
(12)
(13)
(15)
(16)
(17)
(18)
(19)
(20)
),EndDate: (UnixUtcTime:
long,OffsetMinutes: int))},IsManager: boolean)},Employee: (Id:
chararray,Name: chararray))})},*dirtydata::Created*: (UnixUtcTime:
long,OffsetMinutes: int)}
On 15 October 2014 19:00, Serega Sheypak serega.shey...@gmail.com wrote:
what are values for these variables
directly as json, you don't have to map relation
fileds by names.
Could you point me to some example or documentation how that would be
performed? I don't see any such documentation on AvroStorage wiki.
Thanks
jakub
On 16 October 2014 10:09, Serega Sheypak serega.shey...@gmail.com wrote
I suppose you have toset this prop on jobtracker side
2014-10-16 22:03 GMT+04:00 Rodrigo Ferreira web...@gmail.com:
Hi guys,
I'm getting a Job failed! Error - Counters Exceeded: Limit: 120 error in
my Pig script. I've tried to set both versions of this parameter that I've
found on the
Have a look at
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/PigServer.html
We use it in our integration test utility since default pig-unit doesn't
fit our needs.
You can register script and run it.
2014-10-15 16:45 GMT+04:00 Jakub Stransky stransky...@gmail.com:
That's probably not
? Because othervise I don't see any other option how that would be
resolved.
On 15 October 2014 14:50, Serega Sheypak serega.shey...@gmail.com wrote:
Have a look at
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/PigServer.html
We use it in our integration test utility since default pig-unit
what are values for these variables:
STORE finaldata INTO '$OUT' USING AvroStorage('schema_uri','$SCHEMA');
2014-10-15 17:51 GMT+04:00 Jakub Stransky stransky...@gmail.com:
No_schema_check doesn't help. Essentially we need either to remove relation
name or to ensure that schema is used during
Hi, I'm using
STORE filtered INTO 'hbase://my_table'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('d:sk');
Everything works fine
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime
MaxReduceTime MinReduceTime AvgReduceTime
Yes, I've already found it. Implemented using cusotm code.
Thanks.
2014-08-04 20:10 GMT+04:00 Cheolsoo Park piaozhe...@gmail.com:
No, it's not implemented yet. Here is the jira-
https://issues.apache.org/jira/browse/PIG-3279
On Sat, Aug 2, 2014 at 12:29 PM, Serega Sheypak serega.shey
Hi, can't make it work:
flatRank = FOREACH groupedRecs {
ranked = RANK recs by score DESC;
filtered = FILTER ranked by $0 100;
GENERATE FLATTEN(ranked) as ($0 as rank_val, item_id, r_item_id, score);
}
Error:
Syntax error, unexpected symbol at or near 'RANK'
a=load
--
b=group movies by stars;
--error here movies is not an alias
c= foreach b genearte myudf(a);
2014-07-23 16:03 GMT+04:00 Ashish Dobhal dobhalashish...@gmail.com:
Thanks Shahab and William I am now clear about The count functionality.But
stil I have a doubt in the functioning of
Dobhal dobhalashish...@gmail.com:
Sorry ,
I mean group a by stars;
On Wed, Jul 23, 2014 at 5:58 PM, Serega Sheypak serega.shey...@gmail.com
wrote:
a=load
--
b=group movies by stars;
--error here movies is not an alias
c= foreach b genearte myudf(a);
2014-07-23 16:03
You are welcome, hoe this helps you with pig. It's really easy
2014-07-23 17:01 GMT+04:00 Ashish Dobhal dobhalashish...@gmail.com:
Thanks Serega Sheypak.
On Wed, Jul 23, 2014 at 6:16 PM, Serega Sheypak serega.shey...@gmail.com
wrote:
The best way to get answers for such easy questions
b = foreach a generate flatten(TOKENIZE((chararray)$0)) as word;
--result
--aa
--bb
--cc
--cc
--cc
c = group b by word;
--aa{aa}
--bb{bb}
--cc{cc,cc,cc}
d = foreach c generate COUNT(b), group;
--1, aa
--1, bb
--3, cc
2014-07-21 16:44 GMT+04:00 Ashish Dobhal dobhalashish...@gmail.com:
a =
Sample pseudocode.
The idea is to group tuples by movie_id and count size of group bags.
movieAlias = LOAD 'path/to/movie/files' as (
user_id:long,movie_id:long,timestamp:long);
groupedByMovie = group movieAlias by movie_id;
counted = FOREACH groupedByMovie GENERATE group as movie_id,
What build tool do you use?
Did you try to add log4.properties to test scope (if you use maven) with
root level=ERROR
2014-05-13 19:09 GMT+04:00 Moshe Sayag msa...@gmail.com:
Hi,
When I run a PigUnit test, the full script is printed to stdout / log.
Is there a way to run a test with
when get ~4 (84/24) times less data on each reducer while increasing its
quantity from 24 to 84. It's good. Sometimes data is skewed and simple
reducers quantity bump doesn't help.
2014-04-15 16:41 GMT+04:00 leiwang...@gmail.com leiwang...@gmail.com:
I can fix this by changing heap size.
But
You can extend load func to make it add a field with split id. Pass this
field to UDF with other fields
18.04.2014 12:28 пользователь Abhishek Agarwal abhishc...@gmail.com
написал:
I need this information in the UDF which is a EvalFunc and not LoadFunc.
May be I can communicate the pig split
(i) for i in value]
On Sunday, March 16, 2014 4:46 AM, Serega Sheypak
serega.shey...@gmail.com wrote:
Strange stacktrace. Do you get if from map task log of a job which reads
the data?
Try to replace your udf function with:
@outputSchema(y:float)
def ptile(value):
return 0.0
Strange stacktrace. Do you get if from map task log of a job which reads
the data?
Try to replace your udf function with:
@outputSchema(y:float)
def ptile(value):
return 0.0
Just to see that your UDF is correct.
2014-03-16 3:09 GMT+04:00 Jason W ingv...@yahoo.com:
The stack trace I
no arguments in my udf.
Any insight?
Thanks,
Lei
leiwang...@gmail.com
From: Serega Sheypak
Date: 2014-01-26 20:04
To: user
Subject: Re: How to wrie the user custemed Load Funtion
Try to use this one as start point:
https://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin
that it is being executed Map-side?
One thing to try might be to CROSS first before grouping... although that
might be 2 reduce steps.
On Mon, Jan 20, 2014 at 1:27 AM, Serega Sheypak serega.shey...@gmail.com
wrote:
Hi, I'm in trouble
Here a part of code:
itemGrp = GROUP itemProj1
Hi, I'm in trouble
Here a part of code:
itemGrp = GROUP itemProj1 BY sale_id PARALLEL 12;
notFiltered = FOREACH itemGrp{
itemProj2 = FOREACH itemProj1
GENERATE FLATTEN(
TOTUPLE(id, other_id)) as
and after trying it on several datanodes in the end it failes
Default task attempts = 4?
1. It's better to provde logs
2. Do you use any balancing properties, for eaxmple
pig.exec.reducers.bytes.per.reducer ?
I suppose you have unbalanced data
2014/1/10 Zebeljan, Nebojsa
. But as I read the examples,
it should be there.
On Monday, January 6, 2014, Serega Sheypak wrote:
Yes it works. Exception clearly says that ./passwd is missing.
1. Try to give absolute path to file, see if it works. It should.
2. Then give relative path. Looks like you incorrectly provide
Yes it works. Exception clearly says that ./passwd is missing.
1. Try to give absolute path to file, see if it works. It should.
2. Then give relative path. Looks like you incorrectly provide relative
path. why do you put ./ before filename?
2014/1/6 Russell Jurney russell.jur...@gmail.com
I
https://issues.apache.org/jira/browse/PIG-3638
like it :)
2013/12/21 Ruslan Al-Fakikh metarus...@gmail.com
It seems to be a heavy PigUnit limitation. Maybe you can open a jira for
this?:)
Thanks,
Ruslan
On Sat, Dec 21, 2013 at 2:01 AM, Serega Sheypak serega.shey...@gmail.com
wrote
in Megafon. I tried to use recommended approach - pig unit for my
own purposes.
Fail.
2013/12/21 Ruslan Al-Fakikh metarus...@gmail.com
Hi Serega!
Have you resolved the issue? I am going to encounter the same problem, but
I don't know a solution.
Thanks
On Sun, Dec 15, 2013 at 6:07 PM, Serega
Hi!
By default PigUnit does override LOAD statements
Is there any possiblity to void this?
I'm using AvroStorage because of evolving schemas and I would like to write
in my tests:
--load test dataset
in = LOAD '$pathToIn' using AvroStorage();
--project the fields I need
inProjected = FOREACH in
You have to specify classpath precedence. See here
http://stackoverflow.com/questions/11685949/overriding-default-hadoop-jars-in-class-path
11.12.2013 7:02 пользователь Shawn Hermans shawnherm...@gmail.com
написал:
I am having an issue with a Pig UDF I wrote. The Pig UDF uses a newer
version
Mapred. Child.Java.opts
06.12.2013 20:32 пользователь praveenesh kumar praveen...@gmail.com
написал:
Hi all,
I am using my custom build class and my UDF is acting as wrapper.
I need to pass the command line argument to my UDF, which can be passed
down to the actual class that needs it.
One
Do cross then filter ;)
Pig doesn't support join by condition
08.11.2013 11:45 пользователь Siva Kumar Sunkara
sivakumar.sunk...@shoregrp.com написал:
Hi,
I have a problem in joining two datasets when join pattern matches.
For example:
File1:
1 abc
2 xyz
3
You get 4 empty tuples.
Maybe your UDF parser.customFilter(key,'A') works differently? Maybe
you use the old version?
You can add print statement to UDF and see what does it accept and what
does produce.
2013/11/6 Sameer Tilak ssti...@live.com
Dear Serega,
When I run the script in local
Show the code if your func. How did you define input and output schema?
07.11.2013 2:09 пользователь Sameer Tilak ssti...@live.com написал:
Dear Serega,
I am now using log4j for debugging my UDF. Here is what I found out. For
some reason in mapreduce mode my exec function does not get called.
The same script does not work in the mapreduce mode.
What does it mean?
2013/11/6 Sameer Tilak ssti...@live.com
Hello,
My script in the local mode works perfectly. The same script does not work
in the mapreduce mode. For the local mode, the o/p is saved in the current
directory, where as
Hi, I have a script which has multiple input paths.
Is there any possiblity to test it using piguint?
I see that pigunit supports single input, and what about multiple input?
Better to implement it using Java action in oozie. Oozie java action is
implemented as single map task.
http://oozie.apache.org/docs/3.2.0-incubating/WorkflowFunctionalSpec.html#a3.2.7_Java_Action
Or your aim is to cause DoS after computation is finished :) ?
Anyway it's better to separate
Looks like you have corrupted avro files / NOT avro files inside catalog.
Or your files have different schema.
Try to read about AvroStorage and run it using debug key (see doc
https://cwiki.apache.org/confluence/display/PIG/AvroStorage). It sohuld
help you to get the idea where things go wrong.
Hi, I'm trying to integrate Jython UDF into my maven project.
I have a problem with running scripts where
@outputSchema(blabla) is defined
Here is an error:
File
/home/ssa/devel/etl-masterdata/gsmcell-merger/src/test/python/Test.py,
line 3, in module
from pig.udf.mergerUDF import *
File
Use simple jython UDF. Its 3 lines of code
04.09.2013 18:04 пользователь Sajid Raza windcl...@gmail.com написал:
For my two cents' worth I agree, it's nicer to have coalesce than the
conditional operator.
On Sep 4, 2013, at 8:50 AM, Something Something mailinglist...@gmail.com
wrote:
What
normal junit
tests. Also these ideas come to my mind:
1) You can take a look at PigUnit unility
2) Avro has a human-readable (json text, non-binary) form for debugging
purposes
Best Regards,
Ruslan
On Thu, Aug 29, 2013 at 2:56 PM, Serega Sheypak serega.shey...@gmail.com
wrote:
Hi, we
Hi, we have itegration test java utility which helps to test pig scripts.
We often develop different UDFs in java and I would like to create unit
tests for them. Right now they are tested with pig scripts during
integration tests.
I have a pack of prepared avro files with etalon input for my java
Are you sure that your core-site, hdfs-site, maped-site are in pig's
classpath?
2013/8/23 Tim Chan tc...@edmunds.com
Apache Pig version 0.11.0-cdh4.3.0
Hadoop 2.0.0-cdh4.3.0
Here is the error I'm getting:
2013-08-22 19:27:50,304 [main] INFO org.apache.hadoop.mapreduce.Cluster -
Failed
manish dunani manishd...@gmail.com
grunt describe d;
d: {group: chararray,size: long}
grunt describe e;
e: {countword: long,group: chararray}
On Mon, Aug 12, 2013 at 2:41 PM, Serega Sheypak serega.shey...@gmail.com
wrote:
I suggest you to output DESCRIBE for d and e relation.
2013/8
Pig uses reflection. The top exception says that there is no such method
signature. The problem is in the way you are trying to call method.
And its better to paste the whole stavktrace
13.08.2013 7:39 пользователь Darpan R darpa...@gmail.com написал:
I've a UDF which I use to do custom
, Pradeep Gollakota pradeep...@gmail.com
wrote:
join BIG by key, SMALL by key using 'replicated';
On Fri, Aug 2, 2013 at 5:29 AM, Serega Sheypak serega.shey...@gmail.com
wrote:
Hi. I've met a problem wth replicated join in pig 0.11
I have two relations:
BIG (3-6GB) and SMALL
There is a missing dependency, a jar with class
com.fasterxml.jackson.databind.ObjectMapper
Use Google to find jar. Suggest you to use maven public repos.
24.07.2013 23:16 пользователь Dan Zhu dan...@yahoo-inc.com написал:
Hi pig-users,
I have tuples of nested JSON string, I am trying to parse
Hi, I have rather simple problem and I can't create nice solution.
Here is my input:
msisdn longitude latitude ts
1 20.30 40.50 123
1 0.0 null 456
2 60.70 34.67 678
2 null null 978
I need:
group by msisdn
order by ts inside each group
filter records in each group:
1. put all records where
Hi dear pig users.
Looks like i have significant perfomance problems with Jython UDF. I have
an assumption that UDF call cost is very high.
How can I prove my assumption?
Are there any solutions for such problem?
What are best practices in implementing UDFs?
: Audience Analytics for the Brave
New Digital World
www.comscore.com/multiplatform
-Original Message-
From: Serega Sheypak [mailto:serega.shey...@gmail.com]
Sent: Monday, July 15, 2013 7:48 AM
To: user@pig.apache.org
Subject: Jython UDF invokation
Hi dear pig users.
Looks like i
method. Next I would increase the number of reducers and the
available memory. Possible in PIG with:
SET default_parallel 20;
SET mapred.child.java.opts '-Xmx8196m'
But these are just some quick tipps. Definitely check out the performance
chapters.
Marco
2013/7/15 Serega Sheypak
Hi, I'm testing my scripts in local mode then I run them in production
using oozie.
Locally everything works fine. My pig version is 0.11
When I run the same script in cluster mode, I do get exception on line
where jython udf is invoked. Here is my UDF, see it imports java class.
This class is IN
57 matches
Mail list logo