Re: [DISCUSS] Using shared memory for inter-task communication

2019-11-27 Thread Maxime Beauchemin
If memory is shared across tasks, they are by definition not idempotent, which can be troublesome. What if you have a chain of 3 tasks and the last one failed while operating on the memory that came from task number 2? The whole chain may have to be re-executed, which to me sounds like it's really

Re: [DISCUSS] Using shared memory for inter-task communication

2019-11-27 Thread Jarek Potiuk
Yeah. I was thinking about the new landing page/website where we specifically have section "Use cases" and we can describe some actual examples (and counter-examples specifically) :). J. On Wed, Nov 27, 2019 at 11:43 AM Bolke de Bruin wrote: > From our website :-) > > "Airflow *is not* a data s

Re: [DISCUSS] Using shared memory for inter-task communication

2019-11-27 Thread Bolke de Bruin
>From our website :-) "Airflow *is not* a data streaming solution. Tasks do not move data from one to the other (though tasks can exchange metadata!). Airflow is not in the Spark Streaming or Storm space, it is more comparable to Ooz

Re: [DISCUSS] Using shared memory for inter-task communication

2019-11-27 Thread Jarek Potiuk
Listening to all those comments, that reaffirms the gut feelings I had. Even if like the idea of optimisations, I think it makes sense to say "it's not an Airflow-domain problem really". I think now that XCom is good what it is for and introducing "generic" data passing mechanism goes way beyond wh

Re: [DISCUSS] Using shared memory for inter-task communication

2019-11-27 Thread Tomasz Urbaszek
I agree with Bolke, Airflow is not a data processing tool. Also it should not become one as we already have some awesome solutions like Apache Storm, Flink or Beam. Tomek On Wed, Nov 27, 2019 at 10:24 AM Bolke de Bruin wrote: > My 2 cents: > > I don’t think this makes sense at all as it goes a

Re: [DISCUSS] Using shared memory for inter-task communication

2019-11-27 Thread Bolke de Bruin
My 2 cents: I don’t think this makes sense at all as it goes against to core of Airflow: Airflow does not do data processing by itself. So the only think you should share between tasks is meta data and that you do through XCom. We can redesign com if you want but it is also the only viable option

Re: [DISCUSS] Using shared memory for inter-task communication

2019-11-27 Thread Alex Guziel
Agreed on running before we can crawl. The logical way to do this now is to group it as one big task with more resources. With respect to affinity on the same machine, that's basically what it is. I guess this hinges on well your solution can handle workloads with different resource requirements.

Re: [DISCUSS] Using shared memory for inter-task communication

2019-11-26 Thread Soma S Dhavala
TL;DR I am sharing my thoughts here about supporting in-memory data passing. Not necessarily directly linked to airflow specific implementation, but airflow still does play its role, i.e, chaining computational nodes. In the data-flow graph parlance, airflow operators are fundamentally computatio

Re: [DISCUSS] Using shared memory for inter-task communication

2019-11-26 Thread James Meickle
I think this idea is running before we can even crawl. Before it makes any sense to implement this in Airflow, I think it needs three other things: - A reliable, well-designed component for passing data between tasks first (not XCom!); where shared memory is an _implementation_ of data passing - A

Re: [DISCUSS] Using shared memory for inter-task communication

2019-11-26 Thread Jarek Potiuk
when only one POD finishes its work for example -> when only one *container* finishes its work for example On Tue, Nov 26, 2019 at 12:41 PM Jarek Potiuk wrote: > Another thought. > > It looks like a "Sub-task operator". Kind of a special "Operator" to > handle such case (for example Docker-Compo

Re: [DISCUSS] Using shared memory for inter-task communication

2019-11-26 Thread Jarek Potiuk
Another thought. It looks like a "Sub-task operator". Kind of a special "Operator" to handle such case (for example Docker-Compose or Kubernetes-POD driven). Currently we can trigger other tasks at the same time (parallel) or sequentially (with dependency for finishing/failing the task). But we c

Re: [DISCUSS] Using shared memory for inter-task communication

2019-11-26 Thread Ash Berlin-Taylor
My gut reaction to this is no, not as a general purpose thing. It would only work for LocalExecutor reliably - it's never going to work with Kube executor, and almost never work in Celery. Also the cases where the data is small enough to fit in memory but not so large you need to put it in S3/h

[DISCUSS] Using shared memory for inter-task communication

2019-11-26 Thread Jarek Potiuk
*TL;DR; Discuss whether shared memory data sharing for some tasks is an interesting feature for future Airflow.* I had a few discussions recently with several Airflow users (including at Slack [1] and in person at Warsaw Airflow meetup) about using shared memory for inter-task communication. Airf