https://beamsummit.org/sessions/event-driven-movie-magic/
^^ the question made me think of that use case. Though, unclear how close it is to what you're thinking about. Cheers - On Fri, Dec 15, 2023 at 7:01 AM Byron Ellis via user <user@beam.apache.org> wrote: > As Jan says, theoretically possible? Sure. That particular set of > operations? Overkill. If you don't have it already set up I'd say even > something like Airflow is overkill here. If all you need to do is "launch > job and wait" when a file arrives... that's a small script and not > something that particularly requires a distributed data processing system. > > On Fri, Dec 15, 2023 at 4:58 AM Jan Lukavský <je...@seznam.cz> wrote: > >> Hi, >> >> Apache Beam describes itself as "Apache Beam is an open-source, unified >> programming model for batch and streaming data processing pipelines, ...". >> As such, it is possible to use it to express essentially arbitrary logic >> and run it as a streaming pipeline. A streaming pipeline processes input >> data and produces output data and/or actions. Given these assumptions, it >> is technically feasible to use Apache Beam for orchestrating other >> workflows, the problem is that it will very much likely not be efficient. >> Apache Beam has a lot of heavy-lifting related to the fact it is designed >> to process large volumes of data in a scalable way, which is probably not >> what would one need for workflow orchestration. So, my two cents would be, >> that although it _could_ be done, it probably _should not_ be done. >> >> Best, >> >> Jan >> On 12/15/23 13:39, Mikhail Khludnev wrote: >> >> Hello, >> I think this page https://beam.apache.org/documentation/ml/orchestration/ >> might answer your question. >> Frankly speaking: GCP Workflows and Apache Airflow. >> But Beam itself is a data-stream/flow or batch processor; not a workflow >> engine (IMHO). >> >> On Fri, Dec 15, 2023 at 3:13 PM data_nerd_666 <dataner...@gmail.com> >> wrote: >> >>> I know it is technically possible, but my case may be a little special. >>> Say I have 3 steps for my control flow (ETL workflow): >>> Step 1. upstream file watching >>> Step 2. call some external service to run one job, e.g. run a notebook, >>> run a python script >>> Step 3. notify downstream workflow >>> Can I use apache beam to build a DAG with 3 nodes and run this as either >>> flink or spark job. It might be a little weird, but I just want to >>> learn from the community whether this is the right way to use apache beam, >>> and has anyone done this before? Thanks >>> >>> >>> >>> On Fri, Dec 15, 2023 at 10:28 AM Byron Ellis via user < >>> user@beam.apache.org> wrote: >>> >>>> It’s technically possible but the closest thing I can think of would be >>>> triggering things based on things like file watching. >>>> >>>> On Thu, Dec 14, 2023 at 2:46 PM data_nerd_666 <dataner...@gmail.com> >>>> wrote: >>>> >>>>> Not using beam as time-based scheduler, but just use it to control >>>>> execution orders of ETL workflow DAG, because beam's abstraction is also a >>>>> DAG. >>>>> I know it is a little weird, just want to confirm with the community, >>>>> has anyone used beam like this before? >>>>> >>>>> >>>>> >>>>> On Thu, Dec 14, 2023 at 10:59 PM Jan Lukavský <je...@seznam.cz> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> can you give an example of what you mean for better understanding? Do >>>>>> you mean using Beam as a scheduler of other ETL workflows? >>>>>> >>>>>> Jan >>>>>> >>>>>> On 12/14/23 13:17, data_nerd_666 wrote: >>>>>> > Hi all, >>>>>> > >>>>>> > I am new to apache beam, and am very excited to find beam in apache >>>>>> > community. I see lots of use cases of using apache beam for data >>>>>> flow >>>>>> > (process large amount of batch/streaming data). I am just wondering >>>>>> > whether I can use apache beam for control flow (ETL workflow). I >>>>>> don't >>>>>> > mean the spark/flink job in the ETL workflow, I mean the ETL >>>>>> workflow >>>>>> > itself. Because ETL workflow is also a DAG which is very similar as >>>>>> > the abstraction of apache beam, but unfortunately I didn't find >>>>>> such >>>>>> > use cases on internet. So I'd like to ask this question in beam >>>>>> > community to confirm whether I can use apache beam for control flow >>>>>> > (ETL workflow). If yes, please let me know some success stories of >>>>>> > this. Thanks >>>>>> > >>>>>> > >>>>>> > >>>>>> >>>>> >> >> -- >> Sincerely yours >> Mikhail Khludnev >> >>