Hi! 1) The Remote Executor will automatically transfer the jar, if needed.
2) Background execution is not supported out of the box. I would go for a Java ExecutorService with a FutureTask to kick of tasks in a background thread and allow to check for completion. Stephan On Tue, Nov 25, 2014 at 6:41 PM, Flavio Pompermaier <[email protected]> wrote: > Do I have to upload the jar from my application to the Flink Job manager > every time? > Do I have to wait the job to finish? I'd like to start the job execution, > get an id of it and then poll for its status..is that possible? > > On Tue, Nov 25, 2014 at 6:04 PM, Robert Metzger <[email protected]> > wrote: > >> Cool. >> >> So you have basically two options: >> a) use the bin/flink run tool. >> This tool is meant for users to submit a job once. To use that, upload >> the jar to any location in the file system (not HDFS). >> use ./bin/flink run <pathToJar> -c classNameOfJobYouWantToRun >> <JobArguments> >> to run the job. >> >> b) use the RemoteExecutor. >> For using the remove Executor, you don't need to put your jar file >> anywhere in your cluster. >> The only thing you need is the jar file somewhere were the Java >> Application can access it. >> Inside this Java Application, you have something like: >> >> runJobOne(ExecutionEnvironment ee) { >> ee.readFile( ... ); >> ... >> ee.execute("job 1"); >> } >> >> runJobTwo(Exe ..) { >> ... >> } >> >> >> main() { >> ExecutionEnvironment ee = new Remote execution environment .. >> >> if(something) { >> runJobOne(ee); >> } else if(something else) { >> runJobTwo(ee); >> } ... >> } >> >> >> The object returned by the ExecutionEnvironment.execute() call also >> contains information about the final status of the program (failed etc.). >> >> I hope that helps. >> >> On Tue, Nov 25, 2014 at 5:30 PM, Flavio Pompermaier <[email protected] >> > wrote: >> >>> See inline >>> >>> On Tue, Nov 25, 2014 at 3:37 PM, Robert Metzger <[email protected]> >>> wrote: >>> >>>> Hey, >>>> >>>> maybe we need to go a step back because I did not yet fully understand >>>> what you want to do. >>>> >>>> My understanding so far is the following: >>>> - You have a set of jobs that you've written for Flink >>>> >>> >>> Yes, and they are all in the same jar (that I want to put in the cluster >>> somehow) >>> >>> - You have a cluster with Flink running >>>> >>> >>> Yes! >>> >>> >>>> - You have an external client, which is a Java Application that is >>>> controlling when and how the different jobs are launched. The client is >>>> running basically 24/7 or started by a cronjob. >>>> >>> >>> I have a Java application somewhere that triggers the execution of one >>> of the available jobs in the jar (so I need to pass also the necessary >>> arguments required by each job) and then monitor if the job has been put >>> into a running state and its status (running/failed/finished and percentage >>> would be awesome). >>> I don't think RemoteExecutor is enough..am I wrong? >>> >>> >>>> Correct me if these assumptions are wrong. If they are true, the >>>> RemoteExecutor is probably what you are looking for. Otherwise, we have to >>>> find another solution. >>>> >>>> >>>> On Tue, Nov 25, 2014 at 2:56 PM, Flavio Pompermaier < >>>> [email protected]> wrote: >>>> >>>>> Hi Robert, >>>>> I tried to look at the RemoteExecutor but I can't understand what are >>>>> the exact steps to: >>>>> 1 - (upload if necessary and) register a jar containing multiple main >>>>> methods (one for each job) >>>>> 2 - start the execution of a job from a client >>>>> 3 - monitor the execution of the job >>>>> >>>>> Could you give me the exact java commands/snippets to do that? >>>>> >>>>> >>>>> >>>>> On Sun, Nov 23, 2014 at 8:26 PM, Robert Metzger <[email protected]> >>>>> wrote: >>>>> >>>>>> +1 for providing some utilities/tools for application developers. >>>>>> This could include something like an application registry. I also >>>>>> think that almost every user needs something to parse command line >>>>>> arguments (including default values and comprehensive error messages). >>>>>> We should also see if we can document and properly expose the >>>>>> FileSystem abstraction to Flink app programmers. Users sometimes need to >>>>>> do >>>>>> manipulate files directly. >>>>>> >>>>>> >>>>>> Regarding your second question: >>>>>> For deploying a jar on your cluster, you can use the "bin/flink run >>>>>> <JAR FILE>" command. >>>>>> For starting a Job from an external client you can use the >>>>>> RemoteExecutionEnvironment (you need to know the JobManager address for >>>>>> that). Here is some documentation on that: >>>>>> http://flink.incubator.apache.org/docs/0.7-incubating/cluster_execution.html#remote-environment >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Sat, Nov 22, 2014 at 9:06 PM, Flavio Pompermaier < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> That was exactly what I was looking for. In my case it is not a >>>>>>> problem to use hadoop version because I work on Hadoop. Don't you think >>>>>>> it >>>>>>> could be useful to add a Flink ProgramDriver so that you can use it both >>>>>>> for hadoop and native-flink jobs? >>>>>>> >>>>>>> Now that I understood how to bundle together a bunch of jobs, my >>>>>>> next objective will be to deploy the jar on the cluster (similarity to >>>>>>> what >>>>>>> tge webclient does) and then start the jobs from my external client >>>>>>> (which >>>>>>> in theory just need to know the jar name and the parameters to pass to >>>>>>> every job it wants to call). Do you have an example of that? >>>>>>> On Nov 22, 2014 6:11 PM, "Kostas Tzoumas" <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Are you looking for something like >>>>>>>> https://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/util/ProgramDriver.html >>>>>>>> ? >>>>>>>> >>>>>>>> You should be able to use the Hadoop ProgramDriver directly, see >>>>>>>> for example here: >>>>>>>> https://github.com/ktzoumas/incubator-flink/blob/tez_support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/examples/ExampleDriver.java >>>>>>>> >>>>>>>> If you don't want to introduce a Hadoop dependency in your project, >>>>>>>> you can just copy-paste ProgramDriver, it does not have any >>>>>>>> dependencies to >>>>>>>> Hadoop classes. That class just accumulates <String,Class> pairs >>>>>>>> (simplifying a bit) and calls the main method of the corresponding >>>>>>>> class. >>>>>>>> >>>>>>>> On Sat, Nov 22, 2014 at 5:34 PM, Stephan Ewen <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Not sure I get exactly what this is, but packaging multiple >>>>>>>>> examples in one program is well possible. You can have arbitrary >>>>>>>>> control >>>>>>>>> flow in the main() method. >>>>>>>>> >>>>>>>>> Should be well possible to do something like that hadoop examples >>>>>>>>> setup... >>>>>>>>> >>>>>>>>> On Fri, Nov 21, 2014 at 7:02 PM, Flavio Pompermaier < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> That was something I used to do with hadoop and it's comfortable >>>>>>>>>> when testing stuff (so it is not so important). >>>>>>>>>> For an example see what happens when you run the old "hadoop jar >>>>>>>>>> hadoop-mapreduce-examples.jar" command..it "drives" you to the >>>>>>>>>> correct >>>>>>>>>> invokation of that job. >>>>>>>>>> However, the important thing is that I'd like to keep existing >>>>>>>>>> related jobs somewhere (like a repository of jobs), deploy them and >>>>>>>>>> then be >>>>>>>>>> able to start the one I need from an external program. >>>>>>>>>> >>>>>>>>>> Could this be done with RemoteExecutor? Or is there any WS to >>>>>>>>>> manage the job execution? That would be very useful.. >>>>>>>>>> Is the Client interface the only one that allow something >>>>>>>>>> similar right now? >>>>>>>>>> >>>>>>>>>> On Fri, Nov 21, 2014 at 6:19 PM, Stephan Ewen <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I am not sure exactly what you need there. In Flink you can >>>>>>>>>>> write more than one program in the same program ;-) You can define >>>>>>>>>>> complex >>>>>>>>>>> flows and execute arbitrarily at intermediate points: >>>>>>>>>>> >>>>>>>>>>> main() { >>>>>>>>>>> ExecutionEnvironment env = ...; >>>>>>>>>>> >>>>>>>>>>> env.readSomething().map().join(...).and().so().on(); >>>>>>>>>>> env.execute(); >>>>>>>>>>> >>>>>>>>>>> env.readTheNextThing().do()Something(); >>>>>>>>>>> env.execute(); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> You can also just "save" a program and keep it for later >>>>>>>>>>> execution: >>>>>>>>>>> >>>>>>>>>>> Plan plan = env.createProgramPlan(); >>>>>>>>>>> >>>>>>>>>>> at a later point you can start that plan: new >>>>>>>>>>> RemoteExecutor(master, 6123).execute(plan); >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Stephan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Nov 21, 2014 at 5:49 PM, Flavio Pompermaier < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Any help on this? :( >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Nov 21, 2014 at 9:33 AM, Flavio Pompermaier < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi guys, >>>>>>>>>>>>> I forgot to ask you if there's a Flink utility to simulate the >>>>>>>>>>>>> Hadoop ProgramDriver class that acts somehow like a registry of >>>>>>>>>>>>> jobs. Is >>>>>>>>>>>>> there something similar? >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Flavio >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>> >>> >> >
