Re: Run a job async

2013-01-23 Thread Jonathan Coveney
I think whatever way you slice it, handling thousands of pig jobs asynchronously is going to be a bear. I mean, this is essentially what the job tracker does, albeit with a lot less information. Either way, Pig is not multi-threaded so having more than one instance of Pig in the same JVM is going

Re: Run a job async

2013-01-23 Thread Prashant Kommireddi
Both. Think of it as an app server handling all of these requests. Sent from my iPhone On Jan 23, 2013, at 9:09 PM, Jonathan Coveney wrote: > Thousands of requests, or thousands of Pig jobs? Or both? > > > 2013/1/23 Prashant Kommireddi > >> Did not want to have several threads launched for thi

Re: Run a job async

2013-01-23 Thread Jonathan Coveney
Thousands of requests, or thousands of Pig jobs? Or both? 2013/1/23 Prashant Kommireddi > Did not want to have several threads launched for this. We might have > thousands of requests coming in, and the app is doing a lot more than only > Pig. > > On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coven

Re: Using UDF to process whole record

2013-01-23 Thread Stanley Xu
Thanks all, looks I need to upgrade to pig0.10 since UDF(*) looks not supported in 0.8.1 Best wishes, Stanley Xu On Tue, Jan 22, 2013 at 7:34 PM, Vitalii Tymchyshyn wrote: > BTW: http://pig.apache.org/docs/r0.10.0/basic.html has next example: > C = FOREACH A GENERATE name, age, MyUDF(*); > Look

Re: Run a job async

2013-01-23 Thread Prashant Kommireddi
Did not want to have several threads launched for this. We might have thousands of requests coming in, and the app is doing a lot more than only Pig. On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney wrote: > start a separate Process which runs Pig? > > > 2013/1/23 Prashant Kommireddi > > > Hey

Re: Run a job async

2013-01-23 Thread Jonathan Coveney
start a separate Process which runs Pig? 2013/1/23 Prashant Kommireddi > Hey guys, > > I am trying to do the following: > >1. Launch a pig job asynchronously via Java program >2. Get a notification once the job is complete (something similar to >Hadoop callback with a servlet) > > I

Re: Run a job async

2013-01-23 Thread Bill Graham
You can create in instance of PigProcessNotificationListener that calls back when the job finishes. On Wed, Jan 23, 2013 at 4:48 PM, Prashant Kommireddi wrote: > Hey guys, > > I am trying to do the following: > >1. Launch a pig job asynchronously via Java program >2. Get a notification o

Run a job async

2013-01-23 Thread Prashant Kommireddi
Hey guys, I am trying to do the following: 1. Launch a pig job asynchronously via Java program 2. Get a notification once the job is complete (something similar to Hadoop callback with a servlet) I looked at PigServer.executeBatch() and it seems to be waiting until job completes.This is

Re: Get field from bag with constraints from same relation

2013-01-23 Thread Thomas Bach
On Tue, Jan 22, 2013 at 11:31:23AM -0800, Cheolsoo Park wrote: > > Try this: > > data1 = LOAD '1.txt' USING PigStorage('|') AS (n:int, > B:bag{(m:int,s:chararray)}); > data2 = FOREACH data1 GENERATE n, FLATTEN(B); > data3 = FILTER data2 BY B::m <= n; > data4 = GROUP data3 BY n; > data5 = FOREACH d

Re: Error when run python streaming

2013-01-23 Thread Thomas Bach
On Wed, Jan 23, 2013 at 01:58:29PM +0800, Dongliang Sun wrote: > I import a third-party module 'Pandas'. > > It's successful when I directly run the python code. > Also successful when run the pig script in local mode. > > But has error when run pig script in MapReduce, to debug I comment all of

Re: Python with embbeded pig access to params

2013-01-23 Thread Jakub Glapa
Ok, I was a bit to quick. I'm able to answer to my own question now. from org.apache.pig.scripting import ScriptPigContext ctx = ScriptPigContext.get() params = ctx.getPigContext().getParams() paramFiles = ctx.getPigContext().getParamFiles() getParamFiles() will give me the path to the file

Python with embbeded pig access to params

2013-01-23 Thread Jakub Glapa
Hi, is there a way to get access to the params passed with the pig command in the python code? pig -p param1=val1 -param_file=filepath script.py Based on this: https://issues.apache.org/jira/browse/PIG-2165 I know that those params will be automatically bound. Is there a way access those paramet