Get all counters after pig script is finished

2015-12-14 Thread Serega Sheypak
Hi, is there any possibility to dump all MR counters to console when pig script is finished? I also work with Scalding, they have cool feature. You can implement flow listener (flow ifs a kind of Pig execution DAG). And listen to "flow finished" event. Then you can get access to FlowStats object

Re: Get all counters after pig script is finished

2015-12-14 Thread Serega Sheypak
Print jobStats.hadoopCounters: ") jobStats.hadoopCounters.asList().forEach{it -> println("hadoopCounters element: $it") } } I don't see anything written to console... What do I do wrong? 2015-12-14 10:53 GMT+01:00 Serega Sheypak <serega.shey...@gmail

Re: pig UDF error

2015-02-23 Thread Serega Sheypak
no idea, need: 1. give script 2. give full trace 2015-02-21 6:53 GMT+03:00 pruthvi D pruthvi2...@gmail.com: hello, i am getting this error after i execute my pig udf(python). ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias d can you please let me

Re: IF-ELSE

2015-01-20 Thread Serega Sheypak
You can apply ternary operator or SPLIT function 2015-01-20 11:28 GMT+03:00 patcharee patcharee.thong...@uni.no: Hi, I know that there is no if-else in pig, but is it possible to implement this logic in some way if a input parameter is null, skip this filter operation ? IF ($MONTH !=

Re: PigUnit testing with a Java UDF

2014-11-15 Thread Serega Sheypak
Do you use maven? It should work out of the box. Your UDF is in maven classpath 2014-11-15 2:22 GMT+03:00 Eric Blood winkywoos...@gmail.com: I'm trying to write tests with PigUnit that get built as part of the maven build process. How do I do a 'register' for a jar that isn't built yet? I've

Re: Unable to Join more than two datasets in PIG through JOIN ??

2014-11-05 Thread Serega Sheypak
JOIN (outer) Performs an outer join of *two* relations based on common field values. 2014-11-05 13:22 GMT+03:00 Mohan msmoha...@gmail.com: Hi There, I have Three Datasets as below - PROV: (01) (02) (04) (05) (06) (08) (09) (10) (11) (12) (13) (15) (16) (17) (18) (19) (20)

Re: Pig with avro storage

2014-10-16 Thread Serega Sheypak
),EndDate: (UnixUtcTime: long,OffsetMinutes: int))},IsManager: boolean)},Employee: (Id: chararray,Name: chararray))})},*dirtydata::Created*: (UnixUtcTime: long,OffsetMinutes: int)} On 15 October 2014 19:00, Serega Sheypak serega.shey...@gmail.com wrote: what are values for these variables

Re: Pig with avro storage

2014-10-16 Thread Serega Sheypak
directly as json, you don't have to map relation fileds by names. Could you point me to some example or documentation how that would be performed? I don't see any such documentation on AvroStorage wiki. Thanks jakub On 16 October 2014 10:09, Serega Sheypak serega.shey...@gmail.com wrote

Re: Counters Limit Exceeded

2014-10-16 Thread Serega Sheypak
I suppose you have toset this prop on jobtracker side 2014-10-16 22:03 GMT+04:00 Rodrigo Ferreira web...@gmail.com: Hi guys, I'm getting a Job failed! Error - Counters Exceeded: Limit: 120 error in my Pig script. I've tried to set both versions of this parameter that I've found on the

Re: Submit a pig script from java

2014-10-15 Thread Serega Sheypak
Have a look at http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/PigServer.html We use it in our integration test utility since default pig-unit doesn't fit our needs. You can register script and run it. 2014-10-15 16:45 GMT+04:00 Jakub Stransky stransky...@gmail.com: That's probably not

Re: Submit a pig script from java

2014-10-15 Thread Serega Sheypak
? Because othervise I don't see any other option how that would be resolved. On 15 October 2014 14:50, Serega Sheypak serega.shey...@gmail.com wrote: Have a look at http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/PigServer.html We use it in our integration test utility since default pig-unit

Re: Pig with avro storage

2014-10-15 Thread Serega Sheypak
what are values for these variables: STORE finaldata INTO '$OUT' USING AvroStorage('schema_uri','$SCHEMA'); 2014-10-15 17:51 GMT+04:00 Jakub Stransky stransky...@gmail.com: No_schema_check doesn't help. Essentially we need either to remove relation name or to ensure that schema is used during

hbase loader and pig

2014-08-12 Thread Serega Sheypak
Hi, I'm using STORE filtered INTO 'hbase://my_table' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('d:sk'); Everything works fine Success! Job Stats (time in seconds): JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime

Re: RANK in nested foreach, is it working?

2014-08-04 Thread Serega Sheypak
Yes, I've already found it. Implemented using cusotm code. Thanks. 2014-08-04 20:10 GMT+04:00 Cheolsoo Park piaozhe...@gmail.com: No, it's not implemented yet. Here is the jira- https://issues.apache.org/jira/browse/PIG-3279 On Sat, Aug 2, 2014 at 12:29 PM, Serega Sheypak serega.shey

RANK in nested foreach, is it working?

2014-08-02 Thread Serega Sheypak
Hi, can't make it work: flatRank = FOREACH groupedRecs { ranked = RANK recs by score DESC; filtered = FILTER ranked by $0 100; GENERATE FLATTEN(ranked) as ($0 as rank_val, item_id, r_item_id, score); } Error: Syntax error, unexpected symbol at or near 'RANK'

Re: Problem in understanding UDF COUNT

2014-07-23 Thread Serega Sheypak
a=load -- b=group movies by stars; --error here movies is not an alias c= foreach b genearte myudf(a); 2014-07-23 16:03 GMT+04:00 Ashish Dobhal dobhalashish...@gmail.com: Thanks Shahab and William I am now clear about The count functionality.But stil I have a doubt in the functioning of

Re: Problem in understanding UDF COUNT

2014-07-23 Thread Serega Sheypak
Dobhal dobhalashish...@gmail.com: Sorry , I mean group a by stars; On Wed, Jul 23, 2014 at 5:58 PM, Serega Sheypak serega.shey...@gmail.com wrote: a=load -- b=group movies by stars; --error here movies is not an alias c= foreach b genearte myudf(a); 2014-07-23 16:03

Re: Problem in understanding UDF COUNT

2014-07-23 Thread Serega Sheypak
You are welcome, hoe this helps you with pig. It's really easy 2014-07-23 17:01 GMT+04:00 Ashish Dobhal dobhalashish...@gmail.com: Thanks Serega Sheypak. On Wed, Jul 23, 2014 at 6:16 PM, Serega Sheypak serega.shey...@gmail.com wrote: The best way to get answers for such easy questions

Re: Problem in understanding UDF COUNT

2014-07-21 Thread Serega Sheypak
b = foreach a generate flatten(TOKENIZE((chararray)$0)) as word; --result --aa --bb --cc --cc --cc c = group b by word; --aa{aa} --bb{bb} --cc{cc,cc,cc} d = foreach c generate COUNT(b), group; --1, aa --1, bb --3, cc 2014-07-21 16:44 GMT+04:00 Ashish Dobhal dobhalashish...@gmail.com: a =

Re: Frequency count in pig

2014-05-16 Thread Serega Sheypak
Sample pseudocode. The idea is to group tuples by movie_id and count size of group bags. movieAlias = LOAD 'path/to/movie/files' as ( user_id:long,movie_id:long,timestamp:long); groupedByMovie = group movieAlias by movie_id; counted = FOREACH groupedByMovie GENERATE group as movie_id,

Re: How to run a PigUnit test with minimal output?

2014-05-15 Thread Serega Sheypak
What build tool do you use? Did you try to add log4.properties to test scope (if you use maven) with root level=ERROR 2014-05-13 19:09 GMT+04:00 Moshe Sayag msa...@gmail.com: Hi, When I run a PigUnit test, the full script is printed to stdout / log. Is there a way to run a test with

Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

2014-04-29 Thread Serega Sheypak
when get ~4 (84/24) times less data on each reducer while increasing its quantity from 24 to 84. It's good. Sometimes data is skewed and simple reducers quantity bump doesn't help. 2014-04-15 16:41 GMT+04:00 leiwang...@gmail.com leiwang...@gmail.com: I can fix this by changing heap size. But

Re: Getting current split in EvalFunc

2014-04-18 Thread Serega Sheypak
You can extend load func to make it add a field with split id. Pass this field to UDF with other fields 18.04.2014 12:28 пользователь Abhishek Agarwal abhishc...@gmail.com написал: I need this information in the UDF which is a EvalFunc and not LoadFunc. May be I can communicate the pig split

Re: Python UDF Error

2014-03-17 Thread Serega Sheypak
(i) for i in value] On Sunday, March 16, 2014 4:46 AM, Serega Sheypak serega.shey...@gmail.com wrote: Strange stacktrace. Do you get if from map task log of a job which reads the data? Try to replace your udf function with: @outputSchema(y:float) def ptile(value): return 0.0

Re: Python UDF Error

2014-03-16 Thread Serega Sheypak
Strange stacktrace. Do you get if from map task log of a job which reads the data? Try to replace your udf function with: @outputSchema(y:float) def ptile(value): return 0.0 Just to see that your UDF is correct. 2014-03-16 3:09 GMT+04:00 Jason W ingv...@yahoo.com: The stack trace I

Re: Re: How to wrie the user custemed Load Funtion

2014-01-26 Thread Serega Sheypak
no arguments in my udf. Any insight? Thanks, Lei leiwang...@gmail.com From: Serega Sheypak Date: 2014-01-26 20:04 To: user Subject: Re: How to wrie the user custemed Load Funtion Try to use this one as start point: https://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin

Re: how to control nested CROSS parallelism?

2014-01-21 Thread Serega Sheypak
that it is being executed Map-side? One thing to try might be to CROSS first before grouping... although that might be 2 reduce steps. On Mon, Jan 20, 2014 at 1:27 AM, Serega Sheypak serega.shey...@gmail.com wrote: Hi, I'm in trouble Here a part of code: itemGrp = GROUP itemProj1

how to control nested CROSS parallelism?

2014-01-20 Thread Serega Sheypak
Hi, I'm in trouble Here a part of code: itemGrp = GROUP itemProj1 BY sale_id PARALLEL 12; notFiltered = FOREACH itemGrp{ itemProj2 = FOREACH itemProj1 GENERATE FLATTEN( TOTUPLE(id, other_id)) as

Re: Spilling issue - Optimize GROUP BY

2014-01-10 Thread Serega Sheypak
and after trying it on several datanodes in the end it failes Default task attempts = 4? 1. It's better to provde logs 2. Do you use any balancing properties, for eaxmple pig.exec.reducers.bytes.per.reducer ? I suppose you have unbalanced data 2014/1/10 Zebeljan, Nebojsa

Re: Udfcachetest not working

2014-01-07 Thread Serega Sheypak
. But as I read the examples, it should be there. On Monday, January 6, 2014, Serega Sheypak wrote: Yes it works. Exception clearly says that ./passwd is missing. 1. Try to give absolute path to file, see if it works. It should. 2. Then give relative path. Looks like you incorrectly provide

Re: Udfcachetest not working

2014-01-06 Thread Serega Sheypak
Yes it works. Exception clearly says that ./passwd is missing. 1. Try to give absolute path to file, see if it works. It should. 2. Then give relative path. Looks like you incorrectly provide relative path. why do you put ./ before filename? 2014/1/6 Russell Jurney russell.jur...@gmail.com I

Re: Unoverride load in PigUint

2013-12-21 Thread Serega Sheypak
https://issues.apache.org/jira/browse/PIG-3638 like it :) 2013/12/21 Ruslan Al-Fakikh metarus...@gmail.com It seems to be a heavy PigUnit limitation. Maybe you can open a jira for this?:) Thanks, Ruslan On Sat, Dec 21, 2013 at 2:01 AM, Serega Sheypak serega.shey...@gmail.com wrote

Re: Unoverride load in PigUint

2013-12-20 Thread Serega Sheypak
in Megafon. I tried to use recommended approach - pig unit for my own purposes. Fail. 2013/12/21 Ruslan Al-Fakikh metarus...@gmail.com Hi Serega! Have you resolved the issue? I am going to encounter the same problem, but I don't know a solution. Thanks On Sun, Dec 15, 2013 at 6:07 PM, Serega

Unoverride load in PigUint

2013-12-15 Thread Serega Sheypak
Hi! By default PigUnit does override LOAD statements Is there any possiblity to void this? I'm using AvroStorage because of evolving schemas and I would like to write in my tests: --load test dataset in = LOAD '$pathToIn' using AvroStorage(); --project the fields I need inProjected = FOREACH in

Re: Library conflict with Pig UDF

2013-12-10 Thread Serega Sheypak
You have to specify classpath precedence. See here http://stackoverflow.com/questions/11685949/overriding-default-hadoop-jars-in-class-path 11.12.2013 7:02 пользователь Shawn Hermans shawnherm...@gmail.com написал: I am having an issue with a Pig UDF I wrote. The Pig UDF uses a newer version

Re: Passing command line arguments

2013-12-06 Thread Serega Sheypak
Mapred. Child.Java.opts 06.12.2013 20:32 пользователь praveenesh kumar praveen...@gmail.com написал: Hi all, I am using my custom build class and my UDF is acting as wrapper. I need to pass the command line argument to my UDF, which can be passed down to the actual class that needs it. One

Re: joining two datasets by join pattern matches

2013-11-08 Thread Serega Sheypak
Do cross then filter ;) Pig doesn't support join by condition 08.11.2013 11:45 пользователь Siva Kumar Sunkara sivakumar.sunk...@shoregrp.com написал: Hi, I have a problem in joining two datasets when join pattern matches. For example: File1: 1 abc 2 xyz 3

Re: More on issue with local vs mapreduce mode

2013-11-06 Thread Serega Sheypak
You get 4 empty tuples. Maybe your UDF parser.customFilter(key,'A') works differently? Maybe you use the old version? You can add print statement to UDF and see what does it accept and what does produce. 2013/11/6 Sameer Tilak ssti...@live.com Dear Serega, When I run the script in local

RE: More on issue with local vs mapreduce mode

2013-11-06 Thread Serega Sheypak
Show the code if your func. How did you define input and output schema? 07.11.2013 2:09 пользователь Sameer Tilak ssti...@live.com написал: Dear Serega, I am now using log4j for debugging my UDF. Here is what I found out. For some reason in mapreduce mode my exec function does not get called.

Re: More on issue with local vs mapreduce mode

2013-11-05 Thread Serega Sheypak
The same script does not work in the mapreduce mode. What does it mean? 2013/11/6 Sameer Tilak ssti...@live.com Hello, My script in the local mode works perfectly. The same script does not work in the mapreduce mode. For the local mode, the o/p is saved in the current directory, where as

pigunit with multiple input

2013-10-26 Thread Serega Sheypak
Hi, I have a script which has multiple input paths. Is there any possiblity to test it using piguint? I see that pigunit supports single input, and what about multiple input?

Re: HTTP Post

2013-10-08 Thread Serega Sheypak
Better to implement it using Java action in oozie. Oozie java action is implemented as single map task. http://oozie.apache.org/docs/3.2.0-incubating/WorkflowFunctionalSpec.html#a3.2.7_Java_Action Or your aim is to cause DoS after computation is finished :) ? Anyway it's better to separate

Re: AvroStorage can't read multiple files?

2013-10-01 Thread Serega Sheypak
Looks like you have corrupted avro files / NOT avro files inside catalog. Or your files have different schema. Try to read about AvroStorage and run it using debug key (see doc https://cwiki.apache.org/confluence/display/PIG/AvroStorage). It sohuld help you to get the idea where things go wrong.

unittest for Jython pig UDFs

2013-09-16 Thread Serega Sheypak
Hi, I'm trying to integrate Jython UDF into my maven project. I have a problem with running scripts where @outputSchema(blabla) is defined Here is an error: File /home/ssa/devel/etl-masterdata/gsmcell-merger/src/test/python/Test.py, line 3, in module from pig.udf.mergerUDF import * File

Re: COALESCE UDF?

2013-09-04 Thread Serega Sheypak
Use simple jython UDF. Its 3 lines of code 04.09.2013 18:04 пользователь Sajid Raza windcl...@gmail.com написал: For my two cents' worth I agree, it's nicer to have coalesce than the conditional operator. On Sep 4, 2013, at 8:50 AM, Something Something mailinglist...@gmail.com wrote: What

Re: Avro to Tuples during UnitTest

2013-08-30 Thread Serega Sheypak
normal junit tests. Also these ideas come to my mind: 1) You can take a look at PigUnit unility 2) Avro has a human-readable (json text, non-binary) form for debugging purposes Best Regards, Ruslan On Thu, Aug 29, 2013 at 2:56 PM, Serega Sheypak serega.shey...@gmail.com wrote: Hi, we

Avro to Tuples during UnitTest

2013-08-29 Thread Serega Sheypak
Hi, we have itegration test java utility which helps to test pig scripts. We often develop different UDFs in java and I would like to create unit tests for them. Right now they are tested with pig scripts during integration tests. I have a pack of prepared avro files with etalon input for my java

Re: No matter what I do, pig is trying to run locally

2013-08-23 Thread Serega Sheypak
Are you sure that your core-site, hdfs-site, maped-site are in pig's classpath? 2013/8/23 Tim Chan tc...@edmunds.com Apache Pig version 0.11.0-cdh4.3.0 Hadoop 2.0.0-cdh4.3.0 Here is the error I'm getting: 2013-08-22 19:27:50,304 [main] INFO org.apache.hadoop.mapreduce.Cluster - Failed

Re: size of words+counts of words getting failed.

2013-08-12 Thread Serega Sheypak
manish dunani manishd...@gmail.com grunt describe d; d: {group: chararray,size: long} grunt describe e; e: {countword: long,group: chararray} On Mon, Aug 12, 2013 at 2:41 PM, Serega Sheypak serega.shey...@gmail.com wrote: I suggest you to output DESCRIBE for d and e relation. 2013/8

Re: Adding dependent jars for UDF in the PIG

2013-08-12 Thread Serega Sheypak
Pig uses reflection. The top exception says that there is no such method signature. The problem is in the way you are trying to call method. And its better to paste the whole stavktrace 13.08.2013 7:39 пользователь Darpan R darpa...@gmail.com написал: I've a UDF which I use to do custom

Re: Replace join with custom implementation

2013-08-02 Thread Serega Sheypak
, Pradeep Gollakota pradeep...@gmail.com wrote: join BIG by key, SMALL by key using 'replicated'; On Fri, Aug 2, 2013 at 5:29 AM, Serega Sheypak serega.shey...@gmail.com wrote: Hi. I've met a problem wth replicated join in pig 0.11 I have two relations: BIG (3-6GB) and SMALL

Re: Parse Nested JSON String in Pig

2013-07-24 Thread Serega Sheypak
There is a missing dependency, a jar with class com.fasterxml.jackson.databind.ObjectMapper Use Google to find jar. Suggest you to use maven public repos. 24.07.2013 23:16 пользователь Dan Zhu dan...@yahoo-inc.com написал: Hi pig-users, I have tuples of nested JSON string, I am trying to parse

Filter bag with multiple output

2013-07-23 Thread Serega Sheypak
Hi, I have rather simple problem and I can't create nice solution. Here is my input: msisdn longitude latitude ts 1 20.30 40.50 123 1 0.0 null 456 2 60.70 34.67 678 2 null null 978 I need: group by msisdn order by ts inside each group filter records in each group: 1. put all records where

Jython UDF invokation

2013-07-15 Thread Serega Sheypak
Hi dear pig users. Looks like i have significant perfomance problems with Jython UDF. I have an assumption that UDF call cost is very high. How can I prove my assumption? Are there any solutions for such problem? What are best practices in implementing UDFs?

Re: Jython UDF invokation

2013-07-15 Thread Serega Sheypak
: Audience Analytics for the Brave New Digital World www.comscore.com/multiplatform -Original Message- From: Serega Sheypak [mailto:serega.shey...@gmail.com] Sent: Monday, July 15, 2013 7:48 AM To: user@pig.apache.org Subject: Jython UDF invokation Hi dear pig users. Looks like i

Re: Jython UDF invokation

2013-07-15 Thread Serega Sheypak
method. Next I would increase the number of reducers and the available memory. Possible in PIG with: SET default_parallel 20; SET mapred.child.java.opts '-Xmx8196m' But these are just some quick tipps. Definitely check out the performance chapters. Marco 2013/7/15 Serega Sheypak

pig 0.11, Error while running pig script with Jython func which imports java class in cluster mode

2013-07-10 Thread Serega Sheypak
Hi, I'm testing my scripts in local mode then I run them in production using oozie. Locally everything works fine. My pig version is 0.11 When I run the same script in cluster mode, I do get exception on line where jython udf is invoked. Here is my UDF, see it imports java class. This class is IN