Wouldn't Apache camel be more appropriate for the orchestration aspect? Then delegate to beam for processing?
On Sun, Dec 24, 2023, 8:47 AM data_nerd_666 <dataner...@gmail.com> wrote: > Thanks Austin & Chad, but my use case is to use beam to do ETL workflow > control, which seems different from your case. I would like to check > whether anyone has used beam for this kind of use case and whether beam is > a good choice. > > On Sat, Dec 23, 2023 at 12:58 AM Chad Dombrova <chad...@gmail.com> wrote: > >> Hi, >> I'm the guy who gave the Movie Magic talk. Since it's possible to write >> stateful transforms with Beam, it is capable of some very sophisticated >> flow control. I've not seen a python framework that combines this with >> streaming data nearly as well. That said, there aren't a lot of great >> working examples out there for transforms that do sophisticated flow >> control, and I feel like we're always wrestling with differences in >> behavior between the direct runner and Dataflow. There was a thread about >> polling patterns [1] on this list that never really got a satisfying >> resolution. Likewise, there was a thread about using an SDF with an >> unbound source [2] that also didn't get fully resolved. >> >> [1] https://lists.apache.org/thread/nsxs49vjokcc5wkvdvbvsqwzq682s7qw >> [2] https://lists.apache.org/thread/n3xgml0z8fok7101q79rsmdgp06lofnb >> >> >> >> On Sun, Dec 17, 2023 at 3:53 PM Austin Bennett <aus...@apache.org> wrote: >> >>> https://beamsummit.org/sessions/event-driven-movie-magic/ >>> >>> ^^ the question made me think of that use case. Though, unclear how >>> close it is to what you're thinking about. >>> >>> Cheers - >>> >>> On Fri, Dec 15, 2023 at 7:01 AM Byron Ellis via user < >>> user@beam.apache.org> wrote: >>> >>>> As Jan says, theoretically possible? Sure. That particular set of >>>> operations? Overkill. If you don't have it already set up I'd say even >>>> something like Airflow is overkill here. If all you need to do is "launch >>>> job and wait" when a file arrives... that's a small script and not >>>> something that particularly requires a distributed data processing system. >>>> >>>> On Fri, Dec 15, 2023 at 4:58 AM Jan Lukavský <je...@seznam.cz> wrote: >>>> >>>>> Hi, >>>>> >>>>> Apache Beam describes itself as "Apache Beam is an open-source, >>>>> unified programming model for batch and streaming data processing >>>>> pipelines, ...". As such, it is possible to use it to express essentially >>>>> arbitrary logic and run it as a streaming pipeline. A streaming pipeline >>>>> processes input data and produces output data and/or actions. Given these >>>>> assumptions, it is technically feasible to use Apache Beam for >>>>> orchestrating other workflows, the problem is that it will very much >>>>> likely >>>>> not be efficient. Apache Beam has a lot of heavy-lifting related to the >>>>> fact it is designed to process large volumes of data in a scalable way, >>>>> which is probably not what would one need for workflow orchestration. So, >>>>> my two cents would be, that although it _could_ be done, it probably >>>>> _should not_ be done. >>>>> >>>>> Best, >>>>> >>>>> Jan >>>>> On 12/15/23 13:39, Mikhail Khludnev wrote: >>>>> >>>>> Hello, >>>>> I think this page >>>>> https://beam.apache.org/documentation/ml/orchestration/ might answer >>>>> your question. >>>>> Frankly speaking: GCP Workflows and Apache Airflow. >>>>> But Beam itself is a data-stream/flow or batch processor; not a >>>>> workflow engine (IMHO). >>>>> >>>>> On Fri, Dec 15, 2023 at 3:13 PM data_nerd_666 <dataner...@gmail.com> >>>>> wrote: >>>>> >>>>>> I know it is technically possible, but my case may be a little >>>>>> special. Say I have 3 steps for my control flow (ETL workflow): >>>>>> Step 1. upstream file watching >>>>>> Step 2. call some external service to run one job, e.g. run a >>>>>> notebook, run a python script >>>>>> Step 3. notify downstream workflow >>>>>> Can I use apache beam to build a DAG with 3 nodes and run this as >>>>>> either flink or spark job. It might be a little weird, but I just want >>>>>> to >>>>>> learn from the community whether this is the right way to use apache >>>>>> beam, >>>>>> and has anyone done this before? Thanks >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Dec 15, 2023 at 10:28 AM Byron Ellis via user < >>>>>> user@beam.apache.org> wrote: >>>>>> >>>>>>> It’s technically possible but the closest thing I can think of would >>>>>>> be triggering things based on things like file watching. >>>>>>> >>>>>>> On Thu, Dec 14, 2023 at 2:46 PM data_nerd_666 <dataner...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Not using beam as time-based scheduler, but just use it to control >>>>>>>> execution orders of ETL workflow DAG, because beam's abstraction is >>>>>>>> also a >>>>>>>> DAG. >>>>>>>> I know it is a little weird, just want to confirm with the >>>>>>>> community, has anyone used beam like this before? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Dec 14, 2023 at 10:59 PM Jan Lukavský <je...@seznam.cz> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> can you give an example of what you mean for better understanding? >>>>>>>>> Do >>>>>>>>> you mean using Beam as a scheduler of other ETL workflows? >>>>>>>>> >>>>>>>>> Jan >>>>>>>>> >>>>>>>>> On 12/14/23 13:17, data_nerd_666 wrote: >>>>>>>>> > Hi all, >>>>>>>>> > >>>>>>>>> > I am new to apache beam, and am very excited to find beam in >>>>>>>>> apache >>>>>>>>> > community. I see lots of use cases of using apache beam for data >>>>>>>>> flow >>>>>>>>> > (process large amount of batch/streaming data). I am just >>>>>>>>> wondering >>>>>>>>> > whether I can use apache beam for control flow (ETL workflow). I >>>>>>>>> don't >>>>>>>>> > mean the spark/flink job in the ETL workflow, I mean the ETL >>>>>>>>> workflow >>>>>>>>> > itself. Because ETL workflow is also a DAG which is very similar >>>>>>>>> as >>>>>>>>> > the abstraction of apache beam, but unfortunately I didn't find >>>>>>>>> such >>>>>>>>> > use cases on internet. So I'd like to ask this question in beam >>>>>>>>> > community to confirm whether I can use apache beam for control >>>>>>>>> flow >>>>>>>>> > (ETL workflow). If yes, please let me know some success stories >>>>>>>>> of >>>>>>>>> > this. Thanks >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> >>>>>>>> >>>>> >>>>> -- >>>>> Sincerely yours >>>>> Mikhail Khludnev >>>>> >>>>>