Hi everyone,

We are currently experimenting with using long-running sensor tasks in 
reschedule-mode. Some of these sensors run more than 10 hours, rescheduling 
every 5 minutes. I’m seeing a lot of these tasks failing without any task log 
being stored.

Looking into the scheduler logs, I see a lot of messages like this (this 
instance failed after 11 minutes):

> Executor reports task instance <TaskInstance: ***.*** 2019-12-14 
> 00:00:00+00:00 [queued]> finished (success) although the task says its 
> queued. Was the task killed externally?

We are using Airflow with the Celery executor, and Redis as the broker and 
result backend (I hope I got the terminology right here). Some google indicates 
that we should not be using Redis as the result_backend, but rather a database. 
I’m happy to make this change, but I’d really like to understand better how 
that would cause such errors. Can someone explain a bit more what the 
result_backend really does, and why using Redis here might be causing problems?

The documentation also advises to use a visibility_timeout longer that the 
longest running task with Celery - I’m wondering if this also includes 
rescheduling sensors? I also have troubles understanding what specifically this 
setting does. Can someone explain?

Are there any other configuration or setup issues that might be causing such 
behaviour?

Thanks for your help,

        Björn

Reply via email to