Re: Flink ProgramDriver

Flavio Pompermaier Tue, 25 Nov 2014 10:09:46 -0800

Sounds good to me..how do you check for completion from java code?

On Tue, Nov 25, 2014 at 6:56 PM, Stephan Ewen <se...@apache.org> wrote:


> Hi!
>
> 1) The Remote Executor will automatically transfer the jar, if needed.
>
> 2) Background execution is not supported out of the box. I would go for a
> Java ExecutorService with a FutureTask to kick of tasks in a background
> thread and allow to check for completion.
>
> Stephan
>
>
> On Tue, Nov 25, 2014 at 6:41 PM, Flavio Pompermaier <pomperma...@okkam.it>
> wrote:
>
>> Do I have to upload the jar from my application to the Flink Job manager
>> every time?
>> Do I have to wait the job to finish? I'd like to start the job execution,
>> get an id of it and then poll for its status..is that possible?
>>
>> On Tue, Nov 25, 2014 at 6:04 PM, Robert Metzger <rmetz...@apache.org>
>> wrote:
>>
>>> Cool.
>>>
>>> So you have basically two options:
>>> a) use the bin/flink run tool.
>>> This tool is meant for users to submit a job once. To use that, upload
>>> the jar to any location in the file system (not HDFS).
>>> use ./bin/flink run <pathToJar> -c classNameOfJobYouWantToRun
>>> <JobArguments>
>>> to run the job.
>>>
>>> b) use the RemoteExecutor.
>>> For using the remove Executor, you don't need to put your jar file
>>> anywhere in your cluster.
>>> The only thing you need is the jar file somewhere were the Java
>>> Application can access it.
>>> Inside this Java Application, you have something like:
>>>
>>> runJobOne(ExecutionEnvironment ee) {
>>>  ee.readFile( ... );
>>>  ...
>>>   ee.execute("job 1");
>>> }
>>>
>>> runJobTwo(Exe ..) {
>>>  ...
>>> }
>>>
>>>
>>> main() {
>>>  ExecutionEnvironment  ee = new Remote execution environment ..
>>>
>>>  if(something) {
>>>      runJobOne(ee);
>>>  } else if(something else) {
>>>     runJobTwo(ee);
>>>  } ...
>>> }
>>>
>>>
>>> The object returned by the ExecutionEnvironment.execute() call also
>>> contains information about the final status of the program (failed etc.).
>>>
>>> I hope that helps.
>>>
>>> On Tue, Nov 25, 2014 at 5:30 PM, Flavio Pompermaier <
>>> pomperma...@okkam.it> wrote:
>>>
>>>> See inline
>>>>
>>>> On Tue, Nov 25, 2014 at 3:37 PM, Robert Metzger <rmetz...@apache.org>
>>>> wrote:
>>>>
>>>>> Hey,
>>>>>
>>>>> maybe we need to go a step back because I did not yet fully understand
>>>>> what you want to do.
>>>>>
>>>>> My understanding so far is the following:
>>>>> - You have a set of jobs that you've written for Flink
>>>>>
>>>>
>>>> Yes, and they are all in the same jar (that I want to put in the
>>>> cluster somehow)
>>>>
>>>> - You have a cluster with Flink running
>>>>>
>>>>
>>>> Yes!
>>>>
>>>>
>>>>> - You have an external client, which is a Java Application that is
>>>>> controlling when and how the different jobs are launched. The client is
>>>>> running basically 24/7 or started by a cronjob.
>>>>>
>>>>
>>>> I have a Java application somewhere that triggers the execution of one
>>>> of the available jobs in the jar (so I need to pass also the necessary
>>>> arguments required by each job) and then monitor if the job has been put
>>>> into a running state and its status (running/failed/finished and percentage
>>>> would be awesome).
>>>> I don't think RemoteExecutor is enough..am I wrong?
>>>>
>>>>
>>>>> Correct me if these assumptions are wrong. If they are true, the
>>>>> RemoteExecutor is probably what you are looking for. Otherwise, we have to
>>>>> find another solution.
>>>>>
>>>>>
>>>>> On Tue, Nov 25, 2014 at 2:56 PM, Flavio Pompermaier <
>>>>> pomperma...@okkam.it> wrote:
>>>>>
>>>>>> Hi Robert,
>>>>>> I tried to look at the RemoteExecutor but I can't understand what are
>>>>>> the exact steps to:
>>>>>> 1 - (upload if necessary and) register a jar containing multiple main
>>>>>> methods (one for each job)
>>>>>> 2 - start the execution of a job from a client
>>>>>> 3 - monitor the execution of the job
>>>>>>
>>>>>> Could you give me the exact java commands/snippets to do that?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Nov 23, 2014 at 8:26 PM, Robert Metzger <rmetz...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> +1 for providing some utilities/tools for application developers.
>>>>>>> This could include something like an application registry. I also
>>>>>>> think that almost every user needs something to parse command line
>>>>>>> arguments (including default values and comprehensive error messages).
>>>>>>> We should also see if we can document and properly expose the
>>>>>>> FileSystem abstraction to Flink app programmers. Users sometimes need 
>>>>>>> to do
>>>>>>> manipulate files directly.
>>>>>>>
>>>>>>>
>>>>>>> Regarding your second question:
>>>>>>> For deploying a jar on your cluster, you can use the "bin/flink run
>>>>>>> <JAR FILE>" command.
>>>>>>> For starting a Job from an external client you can use the
>>>>>>> RemoteExecutionEnvironment (you need to know the JobManager address for
>>>>>>> that). Here is some documentation on that:
>>>>>>> http://flink.incubator.apache.org/docs/0.7-incubating/cluster_execution.html#remote-environment
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Nov 22, 2014 at 9:06 PM, Flavio Pompermaier <
>>>>>>> pomperma...@okkam.it> wrote:
>>>>>>>
>>>>>>>> That was exactly what I was looking for. In my case it is not a
>>>>>>>> problem to use hadoop version because I work on Hadoop. Don't you 
>>>>>>>> think it
>>>>>>>> could be useful to add a Flink ProgramDriver so that you can use it 
>>>>>>>> both
>>>>>>>> for hadoop and native-flink jobs?
>>>>>>>>
>>>>>>>> Now that I understood how to bundle together a bunch of jobs, my
>>>>>>>> next objective will be to deploy the jar on the cluster (similarity to 
>>>>>>>> what
>>>>>>>> tge webclient does) and then start the jobs from my external client 
>>>>>>>> (which
>>>>>>>> in theory just need to know the jar name and the parameters to pass to
>>>>>>>> every job it wants to call). Do you have an example of that?
>>>>>>>> On Nov 22, 2014 6:11 PM, "Kostas Tzoumas" <ktzou...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Are you looking for something like
>>>>>>>>> https://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/util/ProgramDriver.html
>>>>>>>>> ?
>>>>>>>>>
>>>>>>>>> You should be able to use the Hadoop ProgramDriver directly, see
>>>>>>>>> for example here:
>>>>>>>>> https://github.com/ktzoumas/incubator-flink/blob/tez_support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/examples/ExampleDriver.java
>>>>>>>>>
>>>>>>>>> If you don't want to introduce a Hadoop dependency in your
>>>>>>>>> project, you can just copy-paste ProgramDriver, it does not have any
>>>>>>>>> dependencies to Hadoop classes. That class just accumulates 
>>>>>>>>> <String,Class>
>>>>>>>>> pairs (simplifying a bit) and calls the main method of the 
>>>>>>>>> corresponding
>>>>>>>>> class.
>>>>>>>>>
>>>>>>>>> On Sat, Nov 22, 2014 at 5:34 PM, Stephan Ewen <se...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Not sure I get exactly what this is, but packaging multiple
>>>>>>>>>> examples in one program is well possible. You can have arbitrary 
>>>>>>>>>> control
>>>>>>>>>> flow in the main() method.
>>>>>>>>>>
>>>>>>>>>> Should be well possible to do something like that hadoop examples
>>>>>>>>>> setup...
>>>>>>>>>>
>>>>>>>>>> On Fri, Nov 21, 2014 at 7:02 PM, Flavio Pompermaier <
>>>>>>>>>> pomperma...@okkam.it> wrote:
>>>>>>>>>>
>>>>>>>>>>> That was something I used to do with hadoop and it's comfortable
>>>>>>>>>>> when testing stuff (so it is not so important).
>>>>>>>>>>> For an example see what happens when you run the old "hadoop jar
>>>>>>>>>>> hadoop-mapreduce-examples.jar" command..it "drives" you to the 
>>>>>>>>>>> correct
>>>>>>>>>>> invokation of that job.
>>>>>>>>>>> However, the important thing is that I'd like to keep existing
>>>>>>>>>>> related jobs somewhere (like a repository of jobs), deploy them and 
>>>>>>>>>>> then be
>>>>>>>>>>> able to start the one I need from an external program.
>>>>>>>>>>>
>>>>>>>>>>> Could this be done with RemoteExecutor? Or is there any WS to
>>>>>>>>>>> manage the job execution? That would be very useful..
>>>>>>>>>>> Is the Client interface the only one that allow something
>>>>>>>>>>> similar right now?
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Nov 21, 2014 at 6:19 PM, Stephan Ewen <se...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I am not sure exactly what you need there. In Flink you can
>>>>>>>>>>>> write more than one program in the same program ;-) You can define 
>>>>>>>>>>>> complex
>>>>>>>>>>>> flows and execute arbitrarily at intermediate points:
>>>>>>>>>>>>
>>>>>>>>>>>> main() {
>>>>>>>>>>>>   ExecutionEnvironment env = ...;
>>>>>>>>>>>>
>>>>>>>>>>>>   env.readSomething().map().join(...).and().so().on();
>>>>>>>>>>>>   env.execute();
>>>>>>>>>>>>
>>>>>>>>>>>>   env.readTheNextThing().do()Something();
>>>>>>>>>>>>   env.execute();
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> You can also just "save" a program and keep it for later
>>>>>>>>>>>> execution:
>>>>>>>>>>>>
>>>>>>>>>>>> Plan plan = env.createProgramPlan();
>>>>>>>>>>>>
>>>>>>>>>>>> at a later point you can start that plan: new
>>>>>>>>>>>> RemoteExecutor(master, 6123).execute(plan);
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Stephan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Nov 21, 2014 at 5:49 PM, Flavio Pompermaier <
>>>>>>>>>>>> pomperma...@okkam.it> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Any help on this? :(
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Nov 21, 2014 at 9:33 AM, Flavio Pompermaier <
>>>>>>>>>>>>> pomperma...@okkam.it> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>>> I forgot to ask you if there's a Flink utility to simulate
>>>>>>>>>>>>>> the Hadoop ProgramDriver class that acts somehow like a registry 
>>>>>>>>>>>>>> of jobs.
>>>>>>>>>>>>>> Is there something similar?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Flavio
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: Flink ProgramDriver

Reply via email to