Sounds good to me..how do you check for completion from java code? On Tue, Nov 25, 2014 at 6:56 PM, Stephan Ewen <se...@apache.org> wrote:
> Hi! > > 1) The Remote Executor will automatically transfer the jar, if needed. > > 2) Background execution is not supported out of the box. I would go for a > Java ExecutorService with a FutureTask to kick of tasks in a background > thread and allow to check for completion. > > Stephan > > > On Tue, Nov 25, 2014 at 6:41 PM, Flavio Pompermaier <pomperma...@okkam.it> > wrote: > >> Do I have to upload the jar from my application to the Flink Job manager >> every time? >> Do I have to wait the job to finish? I'd like to start the job execution, >> get an id of it and then poll for its status..is that possible? >> >> On Tue, Nov 25, 2014 at 6:04 PM, Robert Metzger <rmetz...@apache.org> >> wrote: >> >>> Cool. >>> >>> So you have basically two options: >>> a) use the bin/flink run tool. >>> This tool is meant for users to submit a job once. To use that, upload >>> the jar to any location in the file system (not HDFS). >>> use ./bin/flink run <pathToJar> -c classNameOfJobYouWantToRun >>> <JobArguments> >>> to run the job. >>> >>> b) use the RemoteExecutor. >>> For using the remove Executor, you don't need to put your jar file >>> anywhere in your cluster. >>> The only thing you need is the jar file somewhere were the Java >>> Application can access it. >>> Inside this Java Application, you have something like: >>> >>> runJobOne(ExecutionEnvironment ee) { >>> ee.readFile( ... ); >>> ... >>> ee.execute("job 1"); >>> } >>> >>> runJobTwo(Exe ..) { >>> ... >>> } >>> >>> >>> main() { >>> ExecutionEnvironment ee = new Remote execution environment .. >>> >>> if(something) { >>> runJobOne(ee); >>> } else if(something else) { >>> runJobTwo(ee); >>> } ... >>> } >>> >>> >>> The object returned by the ExecutionEnvironment.execute() call also >>> contains information about the final status of the program (failed etc.). >>> >>> I hope that helps. >>> >>> On Tue, Nov 25, 2014 at 5:30 PM, Flavio Pompermaier < >>> pomperma...@okkam.it> wrote: >>> >>>> See inline >>>> >>>> On Tue, Nov 25, 2014 at 3:37 PM, Robert Metzger <rmetz...@apache.org> >>>> wrote: >>>> >>>>> Hey, >>>>> >>>>> maybe we need to go a step back because I did not yet fully understand >>>>> what you want to do. >>>>> >>>>> My understanding so far is the following: >>>>> - You have a set of jobs that you've written for Flink >>>>> >>>> >>>> Yes, and they are all in the same jar (that I want to put in the >>>> cluster somehow) >>>> >>>> - You have a cluster with Flink running >>>>> >>>> >>>> Yes! >>>> >>>> >>>>> - You have an external client, which is a Java Application that is >>>>> controlling when and how the different jobs are launched. The client is >>>>> running basically 24/7 or started by a cronjob. >>>>> >>>> >>>> I have a Java application somewhere that triggers the execution of one >>>> of the available jobs in the jar (so I need to pass also the necessary >>>> arguments required by each job) and then monitor if the job has been put >>>> into a running state and its status (running/failed/finished and percentage >>>> would be awesome). >>>> I don't think RemoteExecutor is enough..am I wrong? >>>> >>>> >>>>> Correct me if these assumptions are wrong. If they are true, the >>>>> RemoteExecutor is probably what you are looking for. Otherwise, we have to >>>>> find another solution. >>>>> >>>>> >>>>> On Tue, Nov 25, 2014 at 2:56 PM, Flavio Pompermaier < >>>>> pomperma...@okkam.it> wrote: >>>>> >>>>>> Hi Robert, >>>>>> I tried to look at the RemoteExecutor but I can't understand what are >>>>>> the exact steps to: >>>>>> 1 - (upload if necessary and) register a jar containing multiple main >>>>>> methods (one for each job) >>>>>> 2 - start the execution of a job from a client >>>>>> 3 - monitor the execution of the job >>>>>> >>>>>> Could you give me the exact java commands/snippets to do that? >>>>>> >>>>>> >>>>>> >>>>>> On Sun, Nov 23, 2014 at 8:26 PM, Robert Metzger <rmetz...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> +1 for providing some utilities/tools for application developers. >>>>>>> This could include something like an application registry. I also >>>>>>> think that almost every user needs something to parse command line >>>>>>> arguments (including default values and comprehensive error messages). >>>>>>> We should also see if we can document and properly expose the >>>>>>> FileSystem abstraction to Flink app programmers. Users sometimes need >>>>>>> to do >>>>>>> manipulate files directly. >>>>>>> >>>>>>> >>>>>>> Regarding your second question: >>>>>>> For deploying a jar on your cluster, you can use the "bin/flink run >>>>>>> <JAR FILE>" command. >>>>>>> For starting a Job from an external client you can use the >>>>>>> RemoteExecutionEnvironment (you need to know the JobManager address for >>>>>>> that). Here is some documentation on that: >>>>>>> http://flink.incubator.apache.org/docs/0.7-incubating/cluster_execution.html#remote-environment >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, Nov 22, 2014 at 9:06 PM, Flavio Pompermaier < >>>>>>> pomperma...@okkam.it> wrote: >>>>>>> >>>>>>>> That was exactly what I was looking for. In my case it is not a >>>>>>>> problem to use hadoop version because I work on Hadoop. Don't you >>>>>>>> think it >>>>>>>> could be useful to add a Flink ProgramDriver so that you can use it >>>>>>>> both >>>>>>>> for hadoop and native-flink jobs? >>>>>>>> >>>>>>>> Now that I understood how to bundle together a bunch of jobs, my >>>>>>>> next objective will be to deploy the jar on the cluster (similarity to >>>>>>>> what >>>>>>>> tge webclient does) and then start the jobs from my external client >>>>>>>> (which >>>>>>>> in theory just need to know the jar name and the parameters to pass to >>>>>>>> every job it wants to call). Do you have an example of that? >>>>>>>> On Nov 22, 2014 6:11 PM, "Kostas Tzoumas" <ktzou...@apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Are you looking for something like >>>>>>>>> https://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/util/ProgramDriver.html >>>>>>>>> ? >>>>>>>>> >>>>>>>>> You should be able to use the Hadoop ProgramDriver directly, see >>>>>>>>> for example here: >>>>>>>>> https://github.com/ktzoumas/incubator-flink/blob/tez_support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/examples/ExampleDriver.java >>>>>>>>> >>>>>>>>> If you don't want to introduce a Hadoop dependency in your >>>>>>>>> project, you can just copy-paste ProgramDriver, it does not have any >>>>>>>>> dependencies to Hadoop classes. That class just accumulates >>>>>>>>> <String,Class> >>>>>>>>> pairs (simplifying a bit) and calls the main method of the >>>>>>>>> corresponding >>>>>>>>> class. >>>>>>>>> >>>>>>>>> On Sat, Nov 22, 2014 at 5:34 PM, Stephan Ewen <se...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Not sure I get exactly what this is, but packaging multiple >>>>>>>>>> examples in one program is well possible. You can have arbitrary >>>>>>>>>> control >>>>>>>>>> flow in the main() method. >>>>>>>>>> >>>>>>>>>> Should be well possible to do something like that hadoop examples >>>>>>>>>> setup... >>>>>>>>>> >>>>>>>>>> On Fri, Nov 21, 2014 at 7:02 PM, Flavio Pompermaier < >>>>>>>>>> pomperma...@okkam.it> wrote: >>>>>>>>>> >>>>>>>>>>> That was something I used to do with hadoop and it's comfortable >>>>>>>>>>> when testing stuff (so it is not so important). >>>>>>>>>>> For an example see what happens when you run the old "hadoop jar >>>>>>>>>>> hadoop-mapreduce-examples.jar" command..it "drives" you to the >>>>>>>>>>> correct >>>>>>>>>>> invokation of that job. >>>>>>>>>>> However, the important thing is that I'd like to keep existing >>>>>>>>>>> related jobs somewhere (like a repository of jobs), deploy them and >>>>>>>>>>> then be >>>>>>>>>>> able to start the one I need from an external program. >>>>>>>>>>> >>>>>>>>>>> Could this be done with RemoteExecutor? Or is there any WS to >>>>>>>>>>> manage the job execution? That would be very useful.. >>>>>>>>>>> Is the Client interface the only one that allow something >>>>>>>>>>> similar right now? >>>>>>>>>>> >>>>>>>>>>> On Fri, Nov 21, 2014 at 6:19 PM, Stephan Ewen <se...@apache.org> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> I am not sure exactly what you need there. In Flink you can >>>>>>>>>>>> write more than one program in the same program ;-) You can define >>>>>>>>>>>> complex >>>>>>>>>>>> flows and execute arbitrarily at intermediate points: >>>>>>>>>>>> >>>>>>>>>>>> main() { >>>>>>>>>>>> ExecutionEnvironment env = ...; >>>>>>>>>>>> >>>>>>>>>>>> env.readSomething().map().join(...).and().so().on(); >>>>>>>>>>>> env.execute(); >>>>>>>>>>>> >>>>>>>>>>>> env.readTheNextThing().do()Something(); >>>>>>>>>>>> env.execute(); >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> You can also just "save" a program and keep it for later >>>>>>>>>>>> execution: >>>>>>>>>>>> >>>>>>>>>>>> Plan plan = env.createProgramPlan(); >>>>>>>>>>>> >>>>>>>>>>>> at a later point you can start that plan: new >>>>>>>>>>>> RemoteExecutor(master, 6123).execute(plan); >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Stephan >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Nov 21, 2014 at 5:49 PM, Flavio Pompermaier < >>>>>>>>>>>> pomperma...@okkam.it> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Any help on this? :( >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Nov 21, 2014 at 9:33 AM, Flavio Pompermaier < >>>>>>>>>>>>> pomperma...@okkam.it> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi guys, >>>>>>>>>>>>>> I forgot to ask you if there's a Flink utility to simulate >>>>>>>>>>>>>> the Hadoop ProgramDriver class that acts somehow like a registry >>>>>>>>>>>>>> of jobs. >>>>>>>>>>>>>> Is there something similar? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> Flavio >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>> >>> >> >