The execute() call on the Environment blocks. The future will hence not be done until the execution is finished...
On Tue, Nov 25, 2014 at 7:00 PM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > Sounds good to me..how do you check for completion from java code? > > On Tue, Nov 25, 2014 at 6:56 PM, Stephan Ewen <se...@apache.org> wrote: > >> Hi! >> >> 1) The Remote Executor will automatically transfer the jar, if needed. >> >> 2) Background execution is not supported out of the box. I would go for a >> Java ExecutorService with a FutureTask to kick of tasks in a background >> thread and allow to check for completion. >> >> Stephan >> >> >> On Tue, Nov 25, 2014 at 6:41 PM, Flavio Pompermaier <pomperma...@okkam.it >> > wrote: >> >>> Do I have to upload the jar from my application to the Flink Job manager >>> every time? >>> Do I have to wait the job to finish? I'd like to start the job >>> execution, get an id of it and then poll for its status..is that possible? >>> >>> On Tue, Nov 25, 2014 at 6:04 PM, Robert Metzger <rmetz...@apache.org> >>> wrote: >>> >>>> Cool. >>>> >>>> So you have basically two options: >>>> a) use the bin/flink run tool. >>>> This tool is meant for users to submit a job once. To use that, upload >>>> the jar to any location in the file system (not HDFS). >>>> use ./bin/flink run <pathToJar> -c classNameOfJobYouWantToRun >>>> <JobArguments> >>>> to run the job. >>>> >>>> b) use the RemoteExecutor. >>>> For using the remove Executor, you don't need to put your jar file >>>> anywhere in your cluster. >>>> The only thing you need is the jar file somewhere were the Java >>>> Application can access it. >>>> Inside this Java Application, you have something like: >>>> >>>> runJobOne(ExecutionEnvironment ee) { >>>> ee.readFile( ... ); >>>> ... >>>> ee.execute("job 1"); >>>> } >>>> >>>> runJobTwo(Exe ..) { >>>> ... >>>> } >>>> >>>> >>>> main() { >>>> ExecutionEnvironment ee = new Remote execution environment .. >>>> >>>> if(something) { >>>> runJobOne(ee); >>>> } else if(something else) { >>>> runJobTwo(ee); >>>> } ... >>>> } >>>> >>>> >>>> The object returned by the ExecutionEnvironment.execute() call also >>>> contains information about the final status of the program (failed etc.). >>>> >>>> I hope that helps. >>>> >>>> On Tue, Nov 25, 2014 at 5:30 PM, Flavio Pompermaier < >>>> pomperma...@okkam.it> wrote: >>>> >>>>> See inline >>>>> >>>>> On Tue, Nov 25, 2014 at 3:37 PM, Robert Metzger <rmetz...@apache.org> >>>>> wrote: >>>>> >>>>>> Hey, >>>>>> >>>>>> maybe we need to go a step back because I did not yet fully >>>>>> understand what you want to do. >>>>>> >>>>>> My understanding so far is the following: >>>>>> - You have a set of jobs that you've written for Flink >>>>>> >>>>> >>>>> Yes, and they are all in the same jar (that I want to put in the >>>>> cluster somehow) >>>>> >>>>> - You have a cluster with Flink running >>>>>> >>>>> >>>>> Yes! >>>>> >>>>> >>>>>> - You have an external client, which is a Java Application that is >>>>>> controlling when and how the different jobs are launched. The client is >>>>>> running basically 24/7 or started by a cronjob. >>>>>> >>>>> >>>>> I have a Java application somewhere that triggers the execution of one >>>>> of the available jobs in the jar (so I need to pass also the necessary >>>>> arguments required by each job) and then monitor if the job has been put >>>>> into a running state and its status (running/failed/finished and >>>>> percentage >>>>> would be awesome). >>>>> I don't think RemoteExecutor is enough..am I wrong? >>>>> >>>>> >>>>>> Correct me if these assumptions are wrong. If they are true, the >>>>>> RemoteExecutor is probably what you are looking for. Otherwise, we have >>>>>> to >>>>>> find another solution. >>>>>> >>>>>> >>>>>> On Tue, Nov 25, 2014 at 2:56 PM, Flavio Pompermaier < >>>>>> pomperma...@okkam.it> wrote: >>>>>> >>>>>>> Hi Robert, >>>>>>> I tried to look at the RemoteExecutor but I can't understand what >>>>>>> are the exact steps to: >>>>>>> 1 - (upload if necessary and) register a jar containing multiple >>>>>>> main methods (one for each job) >>>>>>> 2 - start the execution of a job from a client >>>>>>> 3 - monitor the execution of the job >>>>>>> >>>>>>> Could you give me the exact java commands/snippets to do that? >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sun, Nov 23, 2014 at 8:26 PM, Robert Metzger <rmetz...@apache.org >>>>>>> > wrote: >>>>>>> >>>>>>>> +1 for providing some utilities/tools for application developers. >>>>>>>> This could include something like an application registry. I also >>>>>>>> think that almost every user needs something to parse command line >>>>>>>> arguments (including default values and comprehensive error messages). >>>>>>>> We should also see if we can document and properly expose the >>>>>>>> FileSystem abstraction to Flink app programmers. Users sometimes need >>>>>>>> to do >>>>>>>> manipulate files directly. >>>>>>>> >>>>>>>> >>>>>>>> Regarding your second question: >>>>>>>> For deploying a jar on your cluster, you can use the "bin/flink run >>>>>>>> <JAR FILE>" command. >>>>>>>> For starting a Job from an external client you can use the >>>>>>>> RemoteExecutionEnvironment (you need to know the JobManager address for >>>>>>>> that). Here is some documentation on that: >>>>>>>> http://flink.incubator.apache.org/docs/0.7-incubating/cluster_execution.html#remote-environment >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Nov 22, 2014 at 9:06 PM, Flavio Pompermaier < >>>>>>>> pomperma...@okkam.it> wrote: >>>>>>>> >>>>>>>>> That was exactly what I was looking for. In my case it is not a >>>>>>>>> problem to use hadoop version because I work on Hadoop. Don't you >>>>>>>>> think it >>>>>>>>> could be useful to add a Flink ProgramDriver so that you can use it >>>>>>>>> both >>>>>>>>> for hadoop and native-flink jobs? >>>>>>>>> >>>>>>>>> Now that I understood how to bundle together a bunch of jobs, my >>>>>>>>> next objective will be to deploy the jar on the cluster (similarity >>>>>>>>> to what >>>>>>>>> tge webclient does) and then start the jobs from my external client >>>>>>>>> (which >>>>>>>>> in theory just need to know the jar name and the parameters to pass to >>>>>>>>> every job it wants to call). Do you have an example of that? >>>>>>>>> On Nov 22, 2014 6:11 PM, "Kostas Tzoumas" <ktzou...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Are you looking for something like >>>>>>>>>> https://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/util/ProgramDriver.html >>>>>>>>>> ? >>>>>>>>>> >>>>>>>>>> You should be able to use the Hadoop ProgramDriver directly, see >>>>>>>>>> for example here: >>>>>>>>>> https://github.com/ktzoumas/incubator-flink/blob/tez_support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/examples/ExampleDriver.java >>>>>>>>>> >>>>>>>>>> If you don't want to introduce a Hadoop dependency in your >>>>>>>>>> project, you can just copy-paste ProgramDriver, it does not have any >>>>>>>>>> dependencies to Hadoop classes. That class just accumulates >>>>>>>>>> <String,Class> >>>>>>>>>> pairs (simplifying a bit) and calls the main method of the >>>>>>>>>> corresponding >>>>>>>>>> class. >>>>>>>>>> >>>>>>>>>> On Sat, Nov 22, 2014 at 5:34 PM, Stephan Ewen <se...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Not sure I get exactly what this is, but packaging multiple >>>>>>>>>>> examples in one program is well possible. You can have arbitrary >>>>>>>>>>> control >>>>>>>>>>> flow in the main() method. >>>>>>>>>>> >>>>>>>>>>> Should be well possible to do something like that hadoop >>>>>>>>>>> examples setup... >>>>>>>>>>> >>>>>>>>>>> On Fri, Nov 21, 2014 at 7:02 PM, Flavio Pompermaier < >>>>>>>>>>> pomperma...@okkam.it> wrote: >>>>>>>>>>> >>>>>>>>>>>> That was something I used to do with hadoop and it's >>>>>>>>>>>> comfortable when testing stuff (so it is not so important). >>>>>>>>>>>> For an example see what happens when you run the old "hadoop >>>>>>>>>>>> jar hadoop-mapreduce-examples.jar" command..it "drives" you to the >>>>>>>>>>>> correct >>>>>>>>>>>> invokation of that job. >>>>>>>>>>>> However, the important thing is that I'd like to keep existing >>>>>>>>>>>> related jobs somewhere (like a repository of jobs), deploy them >>>>>>>>>>>> and then be >>>>>>>>>>>> able to start the one I need from an external program. >>>>>>>>>>>> >>>>>>>>>>>> Could this be done with RemoteExecutor? Or is there any WS to >>>>>>>>>>>> manage the job execution? That would be very useful.. >>>>>>>>>>>> Is the Client interface the only one that allow something >>>>>>>>>>>> similar right now? >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Nov 21, 2014 at 6:19 PM, Stephan Ewen <se...@apache.org >>>>>>>>>>>> > wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I am not sure exactly what you need there. In Flink you can >>>>>>>>>>>>> write more than one program in the same program ;-) You can >>>>>>>>>>>>> define complex >>>>>>>>>>>>> flows and execute arbitrarily at intermediate points: >>>>>>>>>>>>> >>>>>>>>>>>>> main() { >>>>>>>>>>>>> ExecutionEnvironment env = ...; >>>>>>>>>>>>> >>>>>>>>>>>>> env.readSomething().map().join(...).and().so().on(); >>>>>>>>>>>>> env.execute(); >>>>>>>>>>>>> >>>>>>>>>>>>> env.readTheNextThing().do()Something(); >>>>>>>>>>>>> env.execute(); >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> You can also just "save" a program and keep it for later >>>>>>>>>>>>> execution: >>>>>>>>>>>>> >>>>>>>>>>>>> Plan plan = env.createProgramPlan(); >>>>>>>>>>>>> >>>>>>>>>>>>> at a later point you can start that plan: new >>>>>>>>>>>>> RemoteExecutor(master, 6123).execute(plan); >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Stephan >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Nov 21, 2014 at 5:49 PM, Flavio Pompermaier < >>>>>>>>>>>>> pomperma...@okkam.it> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Any help on this? :( >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Nov 21, 2014 at 9:33 AM, Flavio Pompermaier < >>>>>>>>>>>>>> pomperma...@okkam.it> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi guys, >>>>>>>>>>>>>>> I forgot to ask you if there's a Flink utility to simulate >>>>>>>>>>>>>>> the Hadoop ProgramDriver class that acts somehow like a >>>>>>>>>>>>>>> registry of jobs. >>>>>>>>>>>>>>> Is there something similar? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> Flavio >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >>> >> >