I'm sorry, but aren't these question better suited for the Airflow mailing lists?

On 2/2/2021 12:35 PM, Flavio Pompermaier wrote:
Thank you all for the hints. However looking at the REST API[1] of AirFlow 2.0 I can't find how to setup my DAG (if this is the right concept). Do I need to first create a Connection? A DAG?  a TaskInstance? How do I specify the 2 BashOperator? I was thinking to connect to AirFlow via Java so I can't use the Python API..

[1] https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#section/Overview <https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#section/Overview>

On Tue, Feb 2, 2021 at 10:53 AM Arvid Heise <ar...@apache.org <mailto:ar...@apache.org>> wrote:

    Hi Flavio,

    If you know a bit of Python, it's also trivial to add a new Flink
    operator where you can use REST API.

    In general, I'd consider Airflow to be the best choice for your
    problem, especially if it gets more complicated in the future (do
    something else if the first job fails).

    If you have specific questions, feel free to ask.

    Best,

    Arvid

    On Tue, Feb 2, 2021 at 10:08 AM 姜鑫 <jiangxin...@gmail.com
    <mailto:jiangxin...@gmail.com>> wrote:

        Hi Flavio,

        I probably understand what you need. Apache AirFlow is a
        scheduling framework which you can define your own dependent
        operators, therefore you can define a BashOperator to submit
        flink job to you local flink cluster. For example:
        ```
        t1 = BashOperator(
            task_id=‘flink-wordcount',
            bash_command=‘./bin/flink run
        flink/build-target/examples/batch/WordCount.jar',
            ...
        )
        ```
        Alse Airflow supports submitting jobs to kubernetes and you
        can even implement your own operator if bash command doesn’t
        meet your demands.

        Indeed Flink AI (flink-ai-extended
        <https://github.com/alibaba/flink-ai-extended> ?) needs an
        enhanced version of AirFlow, but it is mainly for streaming
        scenario which means the job won’t stop. In your case which
        are all batch jobs it doesn’t help much. Hope this helps.

        Regard,
        Xin


        2021年2月2日 下午4:30,Flavio Pompermaier <pomperma...@okkam.it
        <mailto:pomperma...@okkam.it>> 写道:

        Hi Xin,
        let me state first that I never used AirFlow so I can
        probably miss some background here.
        I just want to externalize the job scheduling to some
        consolidated framework and from what I see Apache AirFlow is
        probably what I need.
        However I can't find any good blog post or documentation
        about how to integrate these 2 technologies using REST API of
        both services.
        I saw that Flink AI decided to use a customized/enhanced
        version of AirFlow [1] but I didn't look into the code to
        understand how they use it.
        In my use case I just want to schedule 2 Flink batch jobs
        using the REST API of AirFlow, where the second one is fired
        after the first.

        [1]
        https://github.com/alibaba/flink-ai-extended/tree/master/flink-ai-flow
        <https://github.com/alibaba/flink-ai-extended/tree/master/flink-ai-flow>

        Best,
        Flavio

        On Tue, Feb 2, 2021 at 2:43 AM 姜鑫 <jiangxin...@gmail.com
        <mailto:jiangxin...@gmail.com>> wrote:

            Hi Flavio,

            Could you explain what your direct question is? In my
            opinion, it is possible to define two airflow operators
            to submit dependent flink job, as long as the first one
            can reach the end.

            Regards,
            Xin

            2021年2月1日 下午6:43,Flavio Pompermaier
            <pomperma...@okkam.it <mailto:pomperma...@okkam.it>> 写道:

            Any advice here?

            On Wed, Jan 27, 2021 at 9:49 PM Flavio Pompermaier
            <pomperma...@okkam.it <mailto:pomperma...@okkam.it>> wrote:

                Hello everybody,
                is there any suggested way/pointer to schedule Flink
                jobs using Apache AirFlow?
                What I'd like to achieve is the submission (using
                the REST API of AirFlow) of 2 jobs, where the second
                one can be executed only if the first one succeed.

                Thanks in advance
                Flavio




Reply via email to