Hi Robert, I tried to look at the RemoteExecutor but I can't understand what are the exact steps to: 1 - (upload if necessary and) register a jar containing multiple main methods (one for each job) 2 - start the execution of a job from a client 3 - monitor the execution of the job
Could you give me the exact java commands/snippets to do that? On Sun, Nov 23, 2014 at 8:26 PM, Robert Metzger <[email protected]> wrote: > +1 for providing some utilities/tools for application developers. > This could include something like an application registry. I also think > that almost every user needs something to parse command line arguments > (including default values and comprehensive error messages). > We should also see if we can document and properly expose the FileSystem > abstraction to Flink app programmers. Users sometimes need to do manipulate > files directly. > > > Regarding your second question: > For deploying a jar on your cluster, you can use the "bin/flink run <JAR > FILE>" command. > For starting a Job from an external client you can use the > RemoteExecutionEnvironment (you need to know the JobManager address for > that). Here is some documentation on that: > http://flink.incubator.apache.org/docs/0.7-incubating/cluster_execution.html#remote-environment > > > > > > > > On Sat, Nov 22, 2014 at 9:06 PM, Flavio Pompermaier <[email protected]> > wrote: > >> That was exactly what I was looking for. In my case it is not a problem >> to use hadoop version because I work on Hadoop. Don't you think it could be >> useful to add a Flink ProgramDriver so that you can use it both for hadoop >> and native-flink jobs? >> >> Now that I understood how to bundle together a bunch of jobs, my next >> objective will be to deploy the jar on the cluster (similarity to what tge >> webclient does) and then start the jobs from my external client (which in >> theory just need to know the jar name and the parameters to pass to every >> job it wants to call). Do you have an example of that? >> On Nov 22, 2014 6:11 PM, "Kostas Tzoumas" <[email protected]> wrote: >> >>> Are you looking for something like >>> https://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/util/ProgramDriver.html >>> ? >>> >>> You should be able to use the Hadoop ProgramDriver directly, see for >>> example here: >>> https://github.com/ktzoumas/incubator-flink/blob/tez_support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/examples/ExampleDriver.java >>> >>> If you don't want to introduce a Hadoop dependency in your project, you >>> can just copy-paste ProgramDriver, it does not have any dependencies to >>> Hadoop classes. That class just accumulates <String,Class> pairs >>> (simplifying a bit) and calls the main method of the corresponding class. >>> >>> On Sat, Nov 22, 2014 at 5:34 PM, Stephan Ewen <[email protected]> wrote: >>> >>>> Not sure I get exactly what this is, but packaging multiple examples in >>>> one program is well possible. You can have arbitrary control flow in the >>>> main() method. >>>> >>>> Should be well possible to do something like that hadoop examples >>>> setup... >>>> >>>> On Fri, Nov 21, 2014 at 7:02 PM, Flavio Pompermaier < >>>> [email protected]> wrote: >>>> >>>>> That was something I used to do with hadoop and it's comfortable when >>>>> testing stuff (so it is not so important). >>>>> For an example see what happens when you run the old "hadoop jar >>>>> hadoop-mapreduce-examples.jar" command..it "drives" you to the correct >>>>> invokation of that job. >>>>> However, the important thing is that I'd like to keep existing related >>>>> jobs somewhere (like a repository of jobs), deploy them and then be able >>>>> to >>>>> start the one I need from an external program. >>>>> >>>>> Could this be done with RemoteExecutor? Or is there any WS to manage >>>>> the job execution? That would be very useful.. >>>>> Is the Client interface the only one that allow something similar >>>>> right now? >>>>> >>>>> On Fri, Nov 21, 2014 at 6:19 PM, Stephan Ewen <[email protected]> >>>>> wrote: >>>>> >>>>>> I am not sure exactly what you need there. In Flink you can write >>>>>> more than one program in the same program ;-) You can define complex >>>>>> flows >>>>>> and execute arbitrarily at intermediate points: >>>>>> >>>>>> main() { >>>>>> ExecutionEnvironment env = ...; >>>>>> >>>>>> env.readSomething().map().join(...).and().so().on(); >>>>>> env.execute(); >>>>>> >>>>>> env.readTheNextThing().do()Something(); >>>>>> env.execute(); >>>>>> } >>>>>> >>>>>> >>>>>> You can also just "save" a program and keep it for later execution: >>>>>> >>>>>> Plan plan = env.createProgramPlan(); >>>>>> >>>>>> at a later point you can start that plan: new RemoteExecutor(master, >>>>>> 6123).execute(plan); >>>>>> >>>>>> >>>>>> >>>>>> Stephan >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Nov 21, 2014 at 5:49 PM, Flavio Pompermaier < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Any help on this? :( >>>>>>> >>>>>>> On Fri, Nov 21, 2014 at 9:33 AM, Flavio Pompermaier < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi guys, >>>>>>>> I forgot to ask you if there's a Flink utility to simulate the >>>>>>>> Hadoop ProgramDriver class that acts somehow like a registry of jobs. >>>>>>>> Is >>>>>>>> there something similar? >>>>>>>> >>>>>>>> Best, >>>>>>>> Flavio >>>>>>>> >>>>>>> >>>>> >>>> >>> >
