Thank Yingbo for starting this and everyone for joining the discussion,
great point about sharding. This would be really useful for large scale
clusters.
I image at the first stage we can reuse the existing logic and make the
smart sensor a special kind of operator( maybe even make scheduler treat
On 2019/03/06 14:31:57, Yingbo Wang wrote:
> hi,>
>
> I would like to open an AIP for Airflow sensor optimization.>
>
>
> *Motivation*:>
>
> Low efficiency in Airflow Sensor Implementation>
>
> Sensors are a special kind of operator that will keep running until a>
> certain criterion is met. Examp
There are two dimension to evaluate how much resource all sensors take in
Airflow: the number of sensors and the duration of each sensor task take.
Batch/smart sensor idea is proposed for the first one and the rescheduling
is for the second one. For airflow cluster running large number of sensor
ta
Rescheduling is of massive use for a DAG where we are waiting for a weekly S3
file delivery from a third party supplier with _massive_ variance in the
delivery time. It'll appear at some point between Thursday AM and Sunday
evening. Not having an executor slot tied up with the S3KeySensor is gre
Sensor-service thing seems to open the door to make sensors a pubsub-type
deal where possible. For example, in Hive, you can keep an in-memory
registry of what partitions to sense for, and tail the audit log to see
when they are populated, instead of polling.
On Wed, Mar 6, 2019 at 1:51 PM Alex Gu
Smart sensor seems like a good idea, but I wonder how much performance will
be improved in practice. And of course, one must think about sharding and
such.
I'm not sure how helpful rescheduling sensors is, since it will add
scheduler and DB load seemingly, which is already a bottleneck.
On Wed, M
I would still like to get some feedback on the batch sensor/smart sensor
idea after viewing the sensor rescheduling PR. Since the reschedule mode
does not reduce the number of worker processes for sensor. The batch sensor
idea is proposed for this purpose and should work well with reschedule
mode.
Wow, Great work from Seelmann! Thanks Fokko for letting us know it. We are
super happy to have this feature.
On Wed, Mar 6, 2019 at 11:24 AM Driesprong, Fokko
wrote:
> Thanks for bringing this up. I've added a comment on the Wiki:
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-17+Ai
Thanks for bringing this up. I've added a comment on the Wiki:
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-17+Airflow+sensor+optimization
Have you looked into the work by Seelmann? Recently he introduced the
ability to reschedule sensors. When rescheduling, the slot will be given
back
hi,
I would like to open an AIP for Airflow sensor optimization.
*Motivation*:
Low efficiency in Airflow Sensor Implementation
Sensors are a special kind of operator that will keep running until a
certain criterion is met. Examples include a specific file landing in HDFS
or S3, a partition app
10 matches
Mail list logo