The execute() call on the Environment blocks. The future will hence not be
done until the execution is finished...

On Tue, Nov 25, 2014 at 7:00 PM, Flavio Pompermaier <pomperma...@okkam.it>
wrote:

> Sounds good to me..how do you check for completion from java code?
>
> On Tue, Nov 25, 2014 at 6:56 PM, Stephan Ewen <se...@apache.org> wrote:
>
>> Hi!
>>
>> 1) The Remote Executor will automatically transfer the jar, if needed.
>>
>> 2) Background execution is not supported out of the box. I would go for a
>> Java ExecutorService with a FutureTask to kick of tasks in a background
>> thread and allow to check for completion.
>>
>> Stephan
>>
>>
>> On Tue, Nov 25, 2014 at 6:41 PM, Flavio Pompermaier <pomperma...@okkam.it
>> > wrote:
>>
>>> Do I have to upload the jar from my application to the Flink Job manager
>>> every time?
>>> Do I have to wait the job to finish? I'd like to start the job
>>> execution, get an id of it and then poll for its status..is that possible?
>>>
>>> On Tue, Nov 25, 2014 at 6:04 PM, Robert Metzger <rmetz...@apache.org>
>>> wrote:
>>>
>>>> Cool.
>>>>
>>>> So you have basically two options:
>>>> a) use the bin/flink run tool.
>>>> This tool is meant for users to submit a job once. To use that, upload
>>>> the jar to any location in the file system (not HDFS).
>>>> use ./bin/flink run <pathToJar> -c classNameOfJobYouWantToRun
>>>> <JobArguments>
>>>> to run the job.
>>>>
>>>> b) use the RemoteExecutor.
>>>> For using the remove Executor, you don't need to put your jar file
>>>> anywhere in your cluster.
>>>> The only thing you need is the jar file somewhere were the Java
>>>> Application can access it.
>>>> Inside this Java Application, you have something like:
>>>>
>>>> runJobOne(ExecutionEnvironment ee) {
>>>>  ee.readFile( ... );
>>>>  ...
>>>>   ee.execute("job 1");
>>>> }
>>>>
>>>> runJobTwo(Exe ..) {
>>>>  ...
>>>> }
>>>>
>>>>
>>>> main() {
>>>>  ExecutionEnvironment  ee = new Remote execution environment ..
>>>>
>>>>  if(something) {
>>>>      runJobOne(ee);
>>>>  } else if(something else) {
>>>>     runJobTwo(ee);
>>>>  } ...
>>>> }
>>>>
>>>>
>>>> The object returned by the ExecutionEnvironment.execute() call also
>>>> contains information about the final status of the program (failed etc.).
>>>>
>>>> I hope that helps.
>>>>
>>>> On Tue, Nov 25, 2014 at 5:30 PM, Flavio Pompermaier <
>>>> pomperma...@okkam.it> wrote:
>>>>
>>>>> See inline
>>>>>
>>>>> On Tue, Nov 25, 2014 at 3:37 PM, Robert Metzger <rmetz...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> maybe we need to go a step back because I did not yet fully
>>>>>> understand what you want to do.
>>>>>>
>>>>>> My understanding so far is the following:
>>>>>> - You have a set of jobs that you've written for Flink
>>>>>>
>>>>>
>>>>> Yes, and they are all in the same jar (that I want to put in the
>>>>> cluster somehow)
>>>>>
>>>>> - You have a cluster with Flink running
>>>>>>
>>>>>
>>>>> Yes!
>>>>>
>>>>>
>>>>>> - You have an external client, which is a Java Application that is
>>>>>> controlling when and how the different jobs are launched. The client is
>>>>>> running basically 24/7 or started by a cronjob.
>>>>>>
>>>>>
>>>>> I have a Java application somewhere that triggers the execution of one
>>>>> of the available jobs in the jar (so I need to pass also the necessary
>>>>> arguments required by each job) and then monitor if the job has been put
>>>>> into a running state and its status (running/failed/finished and 
>>>>> percentage
>>>>> would be awesome).
>>>>> I don't think RemoteExecutor is enough..am I wrong?
>>>>>
>>>>>
>>>>>> Correct me if these assumptions are wrong. If they are true, the
>>>>>> RemoteExecutor is probably what you are looking for. Otherwise, we have 
>>>>>> to
>>>>>> find another solution.
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 25, 2014 at 2:56 PM, Flavio Pompermaier <
>>>>>> pomperma...@okkam.it> wrote:
>>>>>>
>>>>>>> Hi Robert,
>>>>>>> I tried to look at the RemoteExecutor but I can't understand what
>>>>>>> are the exact steps to:
>>>>>>> 1 - (upload if necessary and) register a jar containing multiple
>>>>>>> main methods (one for each job)
>>>>>>> 2 - start the execution of a job from a client
>>>>>>> 3 - monitor the execution of the job
>>>>>>>
>>>>>>> Could you give me the exact java commands/snippets to do that?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Nov 23, 2014 at 8:26 PM, Robert Metzger <rmetz...@apache.org
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> +1 for providing some utilities/tools for application developers.
>>>>>>>> This could include something like an application registry. I also
>>>>>>>> think that almost every user needs something to parse command line
>>>>>>>> arguments (including default values and comprehensive error messages).
>>>>>>>> We should also see if we can document and properly expose the
>>>>>>>> FileSystem abstraction to Flink app programmers. Users sometimes need 
>>>>>>>> to do
>>>>>>>> manipulate files directly.
>>>>>>>>
>>>>>>>>
>>>>>>>> Regarding your second question:
>>>>>>>> For deploying a jar on your cluster, you can use the "bin/flink run
>>>>>>>> <JAR FILE>" command.
>>>>>>>> For starting a Job from an external client you can use the
>>>>>>>> RemoteExecutionEnvironment (you need to know the JobManager address for
>>>>>>>> that). Here is some documentation on that:
>>>>>>>> http://flink.incubator.apache.org/docs/0.7-incubating/cluster_execution.html#remote-environment
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Nov 22, 2014 at 9:06 PM, Flavio Pompermaier <
>>>>>>>> pomperma...@okkam.it> wrote:
>>>>>>>>
>>>>>>>>> That was exactly what I was looking for. In my case it is not a
>>>>>>>>> problem to use hadoop version because I work on Hadoop. Don't you 
>>>>>>>>> think it
>>>>>>>>> could be useful to add a Flink ProgramDriver so that you can use it 
>>>>>>>>> both
>>>>>>>>> for hadoop and native-flink jobs?
>>>>>>>>>
>>>>>>>>> Now that I understood how to bundle together a bunch of jobs, my
>>>>>>>>> next objective will be to deploy the jar on the cluster (similarity 
>>>>>>>>> to what
>>>>>>>>> tge webclient does) and then start the jobs from my external client 
>>>>>>>>> (which
>>>>>>>>> in theory just need to know the jar name and the parameters to pass to
>>>>>>>>> every job it wants to call). Do you have an example of that?
>>>>>>>>> On Nov 22, 2014 6:11 PM, "Kostas Tzoumas" <ktzou...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Are you looking for something like
>>>>>>>>>> https://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/util/ProgramDriver.html
>>>>>>>>>> ?
>>>>>>>>>>
>>>>>>>>>> You should be able to use the Hadoop ProgramDriver directly, see
>>>>>>>>>> for example here:
>>>>>>>>>> https://github.com/ktzoumas/incubator-flink/blob/tez_support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/examples/ExampleDriver.java
>>>>>>>>>>
>>>>>>>>>> If you don't want to introduce a Hadoop dependency in your
>>>>>>>>>> project, you can just copy-paste ProgramDriver, it does not have any
>>>>>>>>>> dependencies to Hadoop classes. That class just accumulates 
>>>>>>>>>> <String,Class>
>>>>>>>>>> pairs (simplifying a bit) and calls the main method of the 
>>>>>>>>>> corresponding
>>>>>>>>>> class.
>>>>>>>>>>
>>>>>>>>>> On Sat, Nov 22, 2014 at 5:34 PM, Stephan Ewen <se...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Not sure I get exactly what this is, but packaging multiple
>>>>>>>>>>> examples in one program is well possible. You can have arbitrary 
>>>>>>>>>>> control
>>>>>>>>>>> flow in the main() method.
>>>>>>>>>>>
>>>>>>>>>>> Should be well possible to do something like that hadoop
>>>>>>>>>>> examples setup...
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Nov 21, 2014 at 7:02 PM, Flavio Pompermaier <
>>>>>>>>>>> pomperma...@okkam.it> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> That was something I used to do with hadoop and it's
>>>>>>>>>>>> comfortable when testing stuff (so it is not so important).
>>>>>>>>>>>> For an example see what happens when you run the old "hadoop
>>>>>>>>>>>> jar hadoop-mapreduce-examples.jar" command..it "drives" you to the 
>>>>>>>>>>>> correct
>>>>>>>>>>>> invokation of that job.
>>>>>>>>>>>> However, the important thing is that I'd like to keep existing
>>>>>>>>>>>> related jobs somewhere (like a repository of jobs), deploy them 
>>>>>>>>>>>> and then be
>>>>>>>>>>>> able to start the one I need from an external program.
>>>>>>>>>>>>
>>>>>>>>>>>> Could this be done with RemoteExecutor? Or is there any WS to
>>>>>>>>>>>> manage the job execution? That would be very useful..
>>>>>>>>>>>> Is the Client interface the only one that allow something
>>>>>>>>>>>> similar right now?
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Nov 21, 2014 at 6:19 PM, Stephan Ewen <se...@apache.org
>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I am not sure exactly what you need there. In Flink you can
>>>>>>>>>>>>> write more than one program in the same program ;-) You can 
>>>>>>>>>>>>> define complex
>>>>>>>>>>>>> flows and execute arbitrarily at intermediate points:
>>>>>>>>>>>>>
>>>>>>>>>>>>> main() {
>>>>>>>>>>>>>   ExecutionEnvironment env = ...;
>>>>>>>>>>>>>
>>>>>>>>>>>>>   env.readSomething().map().join(...).and().so().on();
>>>>>>>>>>>>>   env.execute();
>>>>>>>>>>>>>
>>>>>>>>>>>>>   env.readTheNextThing().do()Something();
>>>>>>>>>>>>>   env.execute();
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> You can also just "save" a program and keep it for later
>>>>>>>>>>>>> execution:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Plan plan = env.createProgramPlan();
>>>>>>>>>>>>>
>>>>>>>>>>>>> at a later point you can start that plan: new
>>>>>>>>>>>>> RemoteExecutor(master, 6123).execute(plan);
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Nov 21, 2014 at 5:49 PM, Flavio Pompermaier <
>>>>>>>>>>>>> pomperma...@okkam.it> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any help on this? :(
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Nov 21, 2014 at 9:33 AM, Flavio Pompermaier <
>>>>>>>>>>>>>> pomperma...@okkam.it> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>>>> I forgot to ask you if there's a Flink utility to simulate
>>>>>>>>>>>>>>> the Hadoop ProgramDriver class that acts somehow like a 
>>>>>>>>>>>>>>> registry of jobs.
>>>>>>>>>>>>>>> Is there something similar?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Flavio
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to