Wouldn't Apache camel be more appropriate for the orchestration aspect?
Then delegate to beam for processing?

On Sun, Dec 24, 2023, 8:47 AM data_nerd_666 <dataner...@gmail.com> wrote:

> Thanks Austin & Chad, but my use case is to use beam to do ETL workflow
> control, which seems different from your case. I would like to check
> whether anyone has used beam for this kind of use case and whether beam is
> a good choice.
>
> On Sat, Dec 23, 2023 at 12:58 AM Chad Dombrova <chad...@gmail.com> wrote:
>
>> Hi,
>> I'm the guy who gave the Movie Magic talk.  Since it's possible to write
>> stateful transforms with Beam, it is capable of some very sophisticated
>> flow control.   I've not seen a python framework that combines this with
>> streaming data nearly as well.  That said, there aren't a lot of great
>> working examples out there for transforms that do sophisticated flow
>> control, and I feel like we're always wrestling with differences in
>> behavior between the direct runner and Dataflow.  There was a thread about
>> polling patterns [1] on this list that never really got a satisfying
>> resolution.  Likewise, there was a thread about using an SDF with an
>> unbound source [2] that also didn't get fully resolved.
>>
>> [1] https://lists.apache.org/thread/nsxs49vjokcc5wkvdvbvsqwzq682s7qw
>> [2] https://lists.apache.org/thread/n3xgml0z8fok7101q79rsmdgp06lofnb
>>
>>
>>
>> On Sun, Dec 17, 2023 at 3:53 PM Austin Bennett <aus...@apache.org> wrote:
>>
>>> https://beamsummit.org/sessions/event-driven-movie-magic/
>>>
>>> ^^ the question made me think of that use case.  Though, unclear how
>>> close it is to what you're thinking about.
>>>
>>> Cheers -
>>>
>>> On Fri, Dec 15, 2023 at 7:01 AM Byron Ellis via user <
>>> user@beam.apache.org> wrote:
>>>
>>>> As Jan says, theoretically possible? Sure. That particular set of
>>>> operations? Overkill. If you don't have it already set up I'd say even
>>>> something like Airflow is overkill here. If all you need to do is "launch
>>>> job and wait" when a file arrives... that's a small script and not
>>>> something that particularly requires a distributed data processing system.
>>>>
>>>> On Fri, Dec 15, 2023 at 4:58 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Apache Beam describes itself as "Apache Beam is an open-source,
>>>>> unified programming model for batch and streaming data processing
>>>>> pipelines, ...". As such, it is possible to use it to express essentially
>>>>> arbitrary logic and run it as a streaming pipeline. A streaming pipeline
>>>>> processes input data and produces output data and/or actions. Given these
>>>>> assumptions, it is technically feasible to use Apache Beam for
>>>>> orchestrating other workflows, the problem is that it will very much 
>>>>> likely
>>>>> not be efficient. Apache Beam has a lot of heavy-lifting related to the
>>>>> fact it is designed to process large volumes of data in a scalable way,
>>>>> which is probably not what would one need for workflow orchestration. So,
>>>>> my two cents would be, that although it _could_ be done, it probably
>>>>> _should not_ be done.
>>>>>
>>>>> Best,
>>>>>
>>>>>  Jan
>>>>> On 12/15/23 13:39, Mikhail Khludnev wrote:
>>>>>
>>>>> Hello,
>>>>> I think this page
>>>>> https://beam.apache.org/documentation/ml/orchestration/ might answer
>>>>> your question.
>>>>> Frankly speaking: GCP Workflows and Apache Airflow.
>>>>> But Beam itself is a data-stream/flow or batch processor; not a
>>>>> workflow engine (IMHO).
>>>>>
>>>>> On Fri, Dec 15, 2023 at 3:13 PM data_nerd_666 <dataner...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I know it is technically possible, but my case may be a little
>>>>>> special. Say I have 3 steps for my control flow (ETL workflow):
>>>>>> Step 1. upstream file watching
>>>>>> Step 2. call some external service to run one job, e.g. run a
>>>>>> notebook, run a python script
>>>>>> Step 3. notify downstream workflow
>>>>>> Can I use apache beam to build a DAG with 3 nodes and run this as
>>>>>> either flink or spark job.  It might be a little weird, but I just want 
>>>>>> to
>>>>>> learn from the community whether this is the right way to use apache 
>>>>>> beam,
>>>>>> and has anyone done this before? Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Dec 15, 2023 at 10:28 AM Byron Ellis via user <
>>>>>> user@beam.apache.org> wrote:
>>>>>>
>>>>>>> It’s technically possible but the closest thing I can think of would
>>>>>>> be triggering things based on things like file watching.
>>>>>>>
>>>>>>> On Thu, Dec 14, 2023 at 2:46 PM data_nerd_666 <dataner...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Not using beam as time-based scheduler, but just use it to control
>>>>>>>> execution orders of ETL workflow DAG, because beam's abstraction is 
>>>>>>>> also a
>>>>>>>> DAG.
>>>>>>>> I know it is a little weird, just want to confirm with the
>>>>>>>> community, has anyone used beam like this before?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Dec 14, 2023 at 10:59 PM Jan Lukavský <je...@seznam.cz>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> can you give an example of what you mean for better understanding?
>>>>>>>>> Do
>>>>>>>>> you mean using Beam as a scheduler of other ETL workflows?
>>>>>>>>>
>>>>>>>>>   Jan
>>>>>>>>>
>>>>>>>>> On 12/14/23 13:17, data_nerd_666 wrote:
>>>>>>>>> > Hi all,
>>>>>>>>> >
>>>>>>>>> > I am new to apache beam, and am very excited to find beam in
>>>>>>>>> apache
>>>>>>>>> > community. I see lots of use cases of using apache beam for data
>>>>>>>>> flow
>>>>>>>>> > (process large amount of batch/streaming data). I am just
>>>>>>>>> wondering
>>>>>>>>> > whether I can use apache beam for control flow (ETL workflow). I
>>>>>>>>> don't
>>>>>>>>> > mean the spark/flink job in the ETL workflow, I mean the ETL
>>>>>>>>> workflow
>>>>>>>>> > itself. Because ETL workflow is also a DAG which is very similar
>>>>>>>>> as
>>>>>>>>> > the abstraction of apache beam, but unfortunately I didn't find
>>>>>>>>> such
>>>>>>>>> > use cases on internet. So I'd like to ask this question in beam
>>>>>>>>> > community to confirm whether I can use apache beam for control
>>>>>>>>> flow
>>>>>>>>> > (ETL workflow). If yes, please let me know some success stories
>>>>>>>>> of
>>>>>>>>> > this. Thanks
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>> --
>>>>> Sincerely yours
>>>>> Mikhail Khludnev
>>>>>
>>>>>

Reply via email to