Re: Flink ProgramDriver

Stephan Ewen Tue, 25 Nov 2014 09:58:11 -0800

Hi!

1) The Remote Executor will automatically transfer the jar, if needed.


2) Background execution is not supported out of the box. I would go for a
Java ExecutorService with a FutureTask to kick of tasks in a background
thread and allow to check for completion.

Stephan


On Tue, Nov 25, 2014 at 6:41 PM, Flavio Pompermaier <[email protected]>
wrote:

> Do I have to upload the jar from my application to the Flink Job manager
> every time?
> Do I have to wait the job to finish? I'd like to start the job execution,
> get an id of it and then poll for its status..is that possible?
>
> On Tue, Nov 25, 2014 at 6:04 PM, Robert Metzger <[email protected]>
> wrote:
>
>> Cool.
>>
>> So you have basically two options:
>> a) use the bin/flink run tool.
>> This tool is meant for users to submit a job once. To use that, upload
>> the jar to any location in the file system (not HDFS).
>> use ./bin/flink run <pathToJar> -c classNameOfJobYouWantToRun
>> <JobArguments>
>> to run the job.
>>
>> b) use the RemoteExecutor.
>> For using the remove Executor, you don't need to put your jar file
>> anywhere in your cluster.
>> The only thing you need is the jar file somewhere were the Java
>> Application can access it.
>> Inside this Java Application, you have something like:
>>
>> runJobOne(ExecutionEnvironment ee) {
>>  ee.readFile( ... );
>>  ...
>>   ee.execute("job 1");
>> }
>>
>> runJobTwo(Exe ..) {
>>  ...
>> }
>>
>>
>> main() {
>>  ExecutionEnvironment  ee = new Remote execution environment ..
>>
>>  if(something) {
>>      runJobOne(ee);
>>  } else if(something else) {
>>     runJobTwo(ee);
>>  } ...
>> }
>>
>>
>> The object returned by the ExecutionEnvironment.execute() call also
>> contains information about the final status of the program (failed etc.).
>>
>> I hope that helps.
>>
>> On Tue, Nov 25, 2014 at 5:30 PM, Flavio Pompermaier <[email protected]
>> > wrote:
>>
>>> See inline
>>>
>>> On Tue, Nov 25, 2014 at 3:37 PM, Robert Metzger <[email protected]>
>>> wrote:
>>>
>>>> Hey,
>>>>
>>>> maybe we need to go a step back because I did not yet fully understand
>>>> what you want to do.
>>>>
>>>> My understanding so far is the following:
>>>> - You have a set of jobs that you've written for Flink
>>>>
>>>
>>> Yes, and they are all in the same jar (that I want to put in the cluster
>>> somehow)
>>>
>>> - You have a cluster with Flink running
>>>>
>>>
>>> Yes!
>>>
>>>
>>>> - You have an external client, which is a Java Application that is
>>>> controlling when and how the different jobs are launched. The client is
>>>> running basically 24/7 or started by a cronjob.
>>>>
>>>
>>> I have a Java application somewhere that triggers the execution of one
>>> of the available jobs in the jar (so I need to pass also the necessary
>>> arguments required by each job) and then monitor if the job has been put
>>> into a running state and its status (running/failed/finished and percentage
>>> would be awesome).
>>> I don't think RemoteExecutor is enough..am I wrong?
>>>
>>>
>>>> Correct me if these assumptions are wrong. If they are true, the
>>>> RemoteExecutor is probably what you are looking for. Otherwise, we have to
>>>> find another solution.
>>>>
>>>>
>>>> On Tue, Nov 25, 2014 at 2:56 PM, Flavio Pompermaier <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Robert,
>>>>> I tried to look at the RemoteExecutor but I can't understand what are
>>>>> the exact steps to:
>>>>> 1 - (upload if necessary and) register a jar containing multiple main
>>>>> methods (one for each job)
>>>>> 2 - start the execution of a job from a client
>>>>> 3 - monitor the execution of the job
>>>>>
>>>>> Could you give me the exact java commands/snippets to do that?
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Nov 23, 2014 at 8:26 PM, Robert Metzger <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> +1 for providing some utilities/tools for application developers.
>>>>>> This could include something like an application registry. I also
>>>>>> think that almost every user needs something to parse command line
>>>>>> arguments (including default values and comprehensive error messages).
>>>>>> We should also see if we can document and properly expose the
>>>>>> FileSystem abstraction to Flink app programmers. Users sometimes need to 
>>>>>> do
>>>>>> manipulate files directly.
>>>>>>
>>>>>>
>>>>>> Regarding your second question:
>>>>>> For deploying a jar on your cluster, you can use the "bin/flink run
>>>>>> <JAR FILE>" command.
>>>>>> For starting a Job from an external client you can use the
>>>>>> RemoteExecutionEnvironment (you need to know the JobManager address for
>>>>>> that). Here is some documentation on that:
>>>>>> http://flink.incubator.apache.org/docs/0.7-incubating/cluster_execution.html#remote-environment
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Nov 22, 2014 at 9:06 PM, Flavio Pompermaier <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> That was exactly what I was looking for. In my case it is not a
>>>>>>> problem to use hadoop version because I work on Hadoop. Don't you think 
>>>>>>> it
>>>>>>> could be useful to add a Flink ProgramDriver so that you can use it both
>>>>>>> for hadoop and native-flink jobs?
>>>>>>>
>>>>>>> Now that I understood how to bundle together a bunch of jobs, my
>>>>>>> next objective will be to deploy the jar on the cluster (similarity to 
>>>>>>> what
>>>>>>> tge webclient does) and then start the jobs from my external client 
>>>>>>> (which
>>>>>>> in theory just need to know the jar name and the parameters to pass to
>>>>>>> every job it wants to call). Do you have an example of that?
>>>>>>> On Nov 22, 2014 6:11 PM, "Kostas Tzoumas" <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Are you looking for something like
>>>>>>>> https://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/util/ProgramDriver.html
>>>>>>>> ?
>>>>>>>>
>>>>>>>> You should be able to use the Hadoop ProgramDriver directly, see
>>>>>>>> for example here:
>>>>>>>> https://github.com/ktzoumas/incubator-flink/blob/tez_support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/examples/ExampleDriver.java
>>>>>>>>
>>>>>>>> If you don't want to introduce a Hadoop dependency in your project,
>>>>>>>> you can just copy-paste ProgramDriver, it does not have any 
>>>>>>>> dependencies to
>>>>>>>> Hadoop classes. That class just accumulates <String,Class> pairs
>>>>>>>> (simplifying a bit) and calls the main method of the corresponding 
>>>>>>>> class.
>>>>>>>>
>>>>>>>> On Sat, Nov 22, 2014 at 5:34 PM, Stephan Ewen <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Not sure I get exactly what this is, but packaging multiple
>>>>>>>>> examples in one program is well possible. You can have arbitrary 
>>>>>>>>> control
>>>>>>>>> flow in the main() method.
>>>>>>>>>
>>>>>>>>> Should be well possible to do something like that hadoop examples
>>>>>>>>> setup...
>>>>>>>>>
>>>>>>>>> On Fri, Nov 21, 2014 at 7:02 PM, Flavio Pompermaier <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> That was something I used to do with hadoop and it's comfortable
>>>>>>>>>> when testing stuff (so it is not so important).
>>>>>>>>>> For an example see what happens when you run the old "hadoop jar
>>>>>>>>>> hadoop-mapreduce-examples.jar" command..it "drives" you to the 
>>>>>>>>>> correct
>>>>>>>>>> invokation of that job.
>>>>>>>>>> However, the important thing is that I'd like to keep existing
>>>>>>>>>> related jobs somewhere (like a repository of jobs), deploy them and 
>>>>>>>>>> then be
>>>>>>>>>> able to start the one I need from an external program.
>>>>>>>>>>
>>>>>>>>>> Could this be done with RemoteExecutor? Or is there any WS to
>>>>>>>>>> manage the job execution? That would be very useful..
>>>>>>>>>> Is the Client interface the only one that allow something
>>>>>>>>>> similar right now?
>>>>>>>>>>
>>>>>>>>>> On Fri, Nov 21, 2014 at 6:19 PM, Stephan Ewen <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I am not sure exactly what you need there. In Flink you can
>>>>>>>>>>> write more than one program in the same program ;-) You can define 
>>>>>>>>>>> complex
>>>>>>>>>>> flows and execute arbitrarily at intermediate points:
>>>>>>>>>>>
>>>>>>>>>>> main() {
>>>>>>>>>>>   ExecutionEnvironment env = ...;
>>>>>>>>>>>
>>>>>>>>>>>   env.readSomething().map().join(...).and().so().on();
>>>>>>>>>>>   env.execute();
>>>>>>>>>>>
>>>>>>>>>>>   env.readTheNextThing().do()Something();
>>>>>>>>>>>   env.execute();
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> You can also just "save" a program and keep it for later
>>>>>>>>>>> execution:
>>>>>>>>>>>
>>>>>>>>>>> Plan plan = env.createProgramPlan();
>>>>>>>>>>>
>>>>>>>>>>> at a later point you can start that plan: new
>>>>>>>>>>> RemoteExecutor(master, 6123).execute(plan);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Stephan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Nov 21, 2014 at 5:49 PM, Flavio Pompermaier <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Any help on this? :(
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Nov 21, 2014 at 9:33 AM, Flavio Pompermaier <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>> I forgot to ask you if there's a Flink utility to simulate the
>>>>>>>>>>>>> Hadoop ProgramDriver class that acts somehow like a registry of 
>>>>>>>>>>>>> jobs. Is
>>>>>>>>>>>>> there something similar?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Flavio
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Re: Flink ProgramDriver

Reply via email to