Re: Flink ProgramDriver

Flavio Pompermaier Tue, 25 Nov 2014 05:58:53 -0800

Hi Robert,
I tried to look at the RemoteExecutor but I can't understand what are the
exact steps to:
1 - (upload if necessary and) register a jar containing multiple main
methods (one for each job)
2 - start the execution of a job from a client
3 - monitor the execution of the job


Could you give me the exact java commands/snippets to do that?



On Sun, Nov 23, 2014 at 8:26 PM, Robert Metzger <[email protected]> wrote:

> +1 for providing some utilities/tools for application developers.
> This could include something like an application registry. I also think
> that almost every user needs something to parse command line arguments
> (including default values and comprehensive error messages).
> We should also see if we can document and properly expose the FileSystem
> abstraction to Flink app programmers. Users sometimes need to do manipulate
> files directly.
>
>
> Regarding your second question:
> For deploying a jar on your cluster, you can use the "bin/flink run <JAR
> FILE>" command.
> For starting a Job from an external client you can use the
> RemoteExecutionEnvironment (you need to know the JobManager address for
> that). Here is some documentation on that:
> http://flink.incubator.apache.org/docs/0.7-incubating/cluster_execution.html#remote-environment
>
>
>
>
>
>
>
> On Sat, Nov 22, 2014 at 9:06 PM, Flavio Pompermaier <[email protected]>
> wrote:
>
>> That was exactly what I was looking for. In my case it is not a problem
>> to use hadoop version because I work on Hadoop. Don't you think it could be
>> useful to add a Flink ProgramDriver so that you can use it both for hadoop
>> and native-flink jobs?
>>
>> Now that I understood how to bundle together a bunch of jobs, my next
>> objective will be to deploy the jar on the cluster (similarity to what tge
>> webclient does) and then start the jobs from my external client (which in
>> theory just need to know the jar name and the parameters to pass to every
>> job it wants to call). Do you have an example of that?
>> On Nov 22, 2014 6:11 PM, "Kostas Tzoumas" <[email protected]> wrote:
>>
>>> Are you looking for something like
>>> https://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/util/ProgramDriver.html
>>> ?
>>>
>>> You should be able to use the Hadoop ProgramDriver directly, see for
>>> example here:
>>> https://github.com/ktzoumas/incubator-flink/blob/tez_support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/examples/ExampleDriver.java
>>>
>>> If you don't want to introduce a Hadoop dependency in your project, you
>>> can just copy-paste ProgramDriver, it does not have any dependencies to
>>> Hadoop classes. That class just accumulates <String,Class> pairs
>>> (simplifying a bit) and calls the main method of the corresponding class.
>>>
>>> On Sat, Nov 22, 2014 at 5:34 PM, Stephan Ewen <[email protected]> wrote:
>>>
>>>> Not sure I get exactly what this is, but packaging multiple examples in
>>>> one program is well possible. You can have arbitrary control flow in the
>>>> main() method.
>>>>
>>>> Should be well possible to do something like that hadoop examples
>>>> setup...
>>>>
>>>> On Fri, Nov 21, 2014 at 7:02 PM, Flavio Pompermaier <
>>>> [email protected]> wrote:
>>>>
>>>>> That was something I used to do with hadoop and it's comfortable when
>>>>> testing stuff (so it is not so important).
>>>>> For an example see what happens when you run the old "hadoop jar
>>>>> hadoop-mapreduce-examples.jar" command..it "drives" you to the correct
>>>>> invokation of that job.
>>>>> However, the important thing is that I'd like to keep existing related
>>>>> jobs somewhere (like a repository of jobs), deploy them and then be able 
>>>>> to
>>>>> start the one I need from an external program.
>>>>>
>>>>> Could this be done with RemoteExecutor? Or is there any WS to manage
>>>>> the job execution? That would be very useful..
>>>>> Is the Client interface the only one that allow something similar
>>>>> right now?
>>>>>
>>>>> On Fri, Nov 21, 2014 at 6:19 PM, Stephan Ewen <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I am not sure exactly what you need there. In Flink you can write
>>>>>> more than one program in the same program ;-) You can define complex 
>>>>>> flows
>>>>>> and execute arbitrarily at intermediate points:
>>>>>>
>>>>>> main() {
>>>>>>   ExecutionEnvironment env = ...;
>>>>>>
>>>>>>   env.readSomething().map().join(...).and().so().on();
>>>>>>   env.execute();
>>>>>>
>>>>>>   env.readTheNextThing().do()Something();
>>>>>>   env.execute();
>>>>>> }
>>>>>>
>>>>>>
>>>>>> You can also just "save" a program and keep it for later execution:
>>>>>>
>>>>>> Plan plan = env.createProgramPlan();
>>>>>>
>>>>>> at a later point you can start that plan: new RemoteExecutor(master,
>>>>>> 6123).execute(plan);
>>>>>>
>>>>>>
>>>>>>
>>>>>> Stephan
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Nov 21, 2014 at 5:49 PM, Flavio Pompermaier <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Any help on this? :(
>>>>>>>
>>>>>>> On Fri, Nov 21, 2014 at 9:33 AM, Flavio Pompermaier <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi guys,
>>>>>>>> I forgot to ask you if there's a Flink utility to simulate the
>>>>>>>> Hadoop ProgramDriver class that acts somehow like a registry of jobs. 
>>>>>>>> Is
>>>>>>>> there something similar?
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Flavio
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>

Re: Flink ProgramDriver

Reply via email to