Getting weird PIG error

2014-02-20 Thread shouvanik.haldar
Hi, I have uploaded a file 'passwd' into hdfs path /user/Haldar/passwd. But when I am executing, A = LOAD '/user/haldar/passwd2' using PigStorage(':'); I am getting error, and the log says Error starting action [pig]. ErrorType [FAILED], ErrorCode [It should never happen], Message [Permission d

Re: Oozie HA support with derby

2014-02-20 Thread Anand Vidwansa
> The Oozie servers in your Oozie HA setup actually are all active; that is, > they are all processing jobs at the same time -- there is no failover. I see. So, this is more of an active-active configuration. I got answers for all my questions. Thanks a lot Robert! Your help is greatly appreciate

Re: Dynamic parallel tasks in Oozie

2014-02-20 Thread Heller, Chris
Yeah this is exactly the type of functionality that I was looking for. I would certainly make use of such a feature. I’m curious, how did your implementation go about defining the inputs to the shards? -Chris On 2/20/14, 5:24 PM, "Alejandro Abdelnur" wrote: >This is what I refer as sharding, i

Re: Dynamic parallel tasks in Oozie

2014-02-20 Thread Alejandro Abdelnur
This is what I refer as sharding, it can be seen as a special type of fork/join where all shards are doing the same actions on different datasets and the number of shards depends on the number of datasets. A while ago I've rewritten the workflow lib, cleanning it up a bit and adding this capabilit

Re: Dynamic parallel tasks in Oozie

2014-02-20 Thread Heller, Chris
How would I invoke a sub-workflow from the Java API? Just create a workflow that only contains a sub-workflow? On 2/20/14, 4:47 PM, "Mona Chitnis" wrote: >If you use the sub-workflow construct, then it would do some error >reporting for you. If a sub-workflow fails, the parent workflow also gets

Re: Dynamic parallel tasks in Oozie

2014-02-20 Thread Mona Chitnis
If you use the sub-workflow construct, then it would do some error reporting for you. If a sub-workflow fails, the parent workflow also gets updated to failed. Also in Oozie 4.0, the JIRA OOZIE-1264 The "parent" property of a subworkflow should be the ID of the parent workflow, helps get the depend

RE: Coordinator action TIMEDOUT when no timeout is set

2014-02-20 Thread Ross, Richard
Thanks, Mona. This is easy to fix. Richard. -Original Message- From: Mona Chitnis [mailto:chit...@yahoo-inc.com] Sent: Thursday, February 20, 2014 3:49 PM To: user@oozie.apache.org Subject: Re: Coordinator action TIMEDOUT when no timeout is set Hi Richard, The default timeout is in fac

Re: Dynamic parallel tasks in Oozie

2014-02-20 Thread Heller, Chris
Mona, Thanks. That is the road I’m headed down. At the moment. I’ll create a Java action which takes the files (or a path glob ― or something) as input, and create multiple Oozie tasks based on that input, and then ‘wait’ for those tasks to complete. A feature like this built into the workflow c

Re: Coordinator action TIMEDOUT when no timeout is set

2014-02-20 Thread Mona Chitnis
Hi Richard, The default timeout is in fact changed from -1 (infinity) to 2 hours, to avoid unnecessary CPU cycles to check for nonexistent data. Property in oozie-site.xml oozie.service.coord.normal.default.timeout 120 Default tim

Re: Dynamic parallel tasks in Oozie

2014-02-20 Thread Mona Chitnis
Hi Chris, There isn¹t a way of dynamic parallel tasks within the same Oozie workflow XML currently. But you can do some programmatically. Using Oozie Java API, you can start a dynamic number of sub-workflows based on the number of outputs. On 2/20/14, 7:05 AM, "Heller, Chris" wrote: >Hi, > >I¹

Re: Delegation Token

2014-02-20 Thread Mona Chitnis
Do you have an authentication setup different from kerberos? On 2/20/14, 6:22 AM, "Roshan Punnoose" wrote: >Hi all, > >I am running into an issue using the Hive Action (and Shell Action to >interact with the hive command line) where the Delegation Token doesn't >seem to be propagated for the pro

Dynamic parallel tasks in Oozie

2014-02-20 Thread Heller, Chris
Hi, I’m trying to figure out the best way to implement a workflow in Oozie. I am creating a workflow which splits an input into multiple outputs. Then for each output I want to run another process over each. The trouble is I cannot know a-priori how many outputs I will have, and so to post pro

Re: Oozie HA support with derby

2014-02-20 Thread Robert Kanter
> > Is there any known data migration tool which can migrate data from derby > to let's say mysql? Migrating the Oozie data out of Derby to another database is somewhat tricky. You can take a look at this procedure given on the Cloudera Community forums, but I can't guarantee that it will work an

Re: oozie 3.3 coordinator question

2014-02-20 Thread Scott Preddy
Actually, my confusion was here (just answered my own question). If the logs are continually present, having the coordinator run once a day will make it so, 24 new logs are grabbed each time. Thanks. On Thu, Feb 20, 2014 at 11:21 AM, Purshotam Shah wrote: > > Yes. If you dataset ³1HourLogs²

Re: oozie 3.3 coordinator question

2014-02-20 Thread Purshotam Shah
Yes. If you dataset ³1HourLogs² is hourly, then every time it going to look for 23 previous hour logs + 1 . On 2/20/14, 9:03 AM, "Scott Preddy" wrote: >Will the snipped below over the same 23 logs it ran over the previous hour >(i.e. just bumping up >the log iterator by 1) each hour, or is o

oozie 3.3 coordinator question

2014-02-20 Thread Scott Preddy
Will the snipped below over the same 23 logs it ran over the previous hour (i.e. just bumping up the log iterator by 1) each hour, or is oozie going to run the action once 24 logs are present, then not kick off the action again until 24 new logs are present? I think it is the former, but just makin

Re: Oozie HA support with derby

2014-02-20 Thread Anand Vidwansa
Thanks a lot for the prompt reply Robert! I do have lot of data in my derby db which I need to migrate now to either of mysql/oracle/postgres. Is there any known data migration tool which can migrate data from derby to let's say mysql? I also have one more question. Since traditionally oozie datab

Delegation Token

2014-02-20 Thread Roshan Punnoose
Hi all, I am running into an issue using the Hive Action (and Shell Action to interact with the hive command line) where the Delegation Token doesn't seem to be propagated for the proxy user: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerb

Coordinator action TIMEDOUT when no timeout is set

2014-02-20 Thread Richard Ross
Hey: Yesterday I setup a daily coordinator with an input dataset. It is scheduled to run everyday at 00:00 and process the dataset. I don't have the piece that creates the dataset automated yet, and was planning to manually create the dataset each morning while I work on the automation pieces.