I'm sorry, but aren't these question better suited for the Airflow
mailing lists?
On 2/2/2021 12:35 PM, Flavio Pompermaier wrote:
Thank you all for the hints. However looking at the REST API[1] of
AirFlow 2.0 I can't find how to setup my DAG (if this is the right
concept).
Do I need to first create a Connection? A DAG? a TaskInstance? How do
I specify the 2 BashOperator?
I was thinking to connect to AirFlow via Java so I can't use the
Python API..
[1]
https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#section/Overview
<https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#section/Overview>
On Tue, Feb 2, 2021 at 10:53 AM Arvid Heise <ar...@apache.org
<mailto:ar...@apache.org>> wrote:
Hi Flavio,
If you know a bit of Python, it's also trivial to add a new Flink
operator where you can use REST API.
In general, I'd consider Airflow to be the best choice for your
problem, especially if it gets more complicated in the future (do
something else if the first job fails).
If you have specific questions, feel free to ask.
Best,
Arvid
On Tue, Feb 2, 2021 at 10:08 AM 姜鑫 <jiangxin...@gmail.com
<mailto:jiangxin...@gmail.com>> wrote:
Hi Flavio,
I probably understand what you need. Apache AirFlow is a
scheduling framework which you can define your own dependent
operators, therefore you can define a BashOperator to submit
flink job to you local flink cluster. For example:
```
t1 = BashOperator(
task_id=‘flink-wordcount',
bash_command=‘./bin/flink run
flink/build-target/examples/batch/WordCount.jar',
...
)
```
Alse Airflow supports submitting jobs to kubernetes and you
can even implement your own operator if bash command doesn’t
meet your demands.
Indeed Flink AI (flink-ai-extended
<https://github.com/alibaba/flink-ai-extended> ?) needs an
enhanced version of AirFlow, but it is mainly for streaming
scenario which means the job won’t stop. In your case which
are all batch jobs it doesn’t help much. Hope this helps.
Regard,
Xin
2021年2月2日 下午4:30,Flavio Pompermaier <pomperma...@okkam.it
<mailto:pomperma...@okkam.it>> 写道:
Hi Xin,
let me state first that I never used AirFlow so I can
probably miss some background here.
I just want to externalize the job scheduling to some
consolidated framework and from what I see Apache AirFlow is
probably what I need.
However I can't find any good blog post or documentation
about how to integrate these 2 technologies using REST API of
both services.
I saw that Flink AI decided to use a customized/enhanced
version of AirFlow [1] but I didn't look into the code to
understand how they use it.
In my use case I just want to schedule 2 Flink batch jobs
using the REST API of AirFlow, where the second one is fired
after the first.
[1]
https://github.com/alibaba/flink-ai-extended/tree/master/flink-ai-flow
<https://github.com/alibaba/flink-ai-extended/tree/master/flink-ai-flow>
Best,
Flavio
On Tue, Feb 2, 2021 at 2:43 AM 姜鑫 <jiangxin...@gmail.com
<mailto:jiangxin...@gmail.com>> wrote:
Hi Flavio,
Could you explain what your direct question is? In my
opinion, it is possible to define two airflow operators
to submit dependent flink job, as long as the first one
can reach the end.
Regards,
Xin
2021年2月1日 下午6:43,Flavio Pompermaier
<pomperma...@okkam.it <mailto:pomperma...@okkam.it>> 写道:
Any advice here?
On Wed, Jan 27, 2021 at 9:49 PM Flavio Pompermaier
<pomperma...@okkam.it <mailto:pomperma...@okkam.it>> wrote:
Hello everybody,
is there any suggested way/pointer to schedule Flink
jobs using Apache AirFlow?
What I'd like to achieve is the submission (using
the REST API of AirFlow) of 2 jobs, where the second
one can be executed only if the first one succeed.
Thanks in advance
Flavio