Hello Airflow Community, Due to some unique circumstances we are running all of our backfills through the Rundeck, it is been working pretty great; however, we’re running into some issues.
When a backfill is run via the airflow cli, a record for the job is created in airflow's `job` table with a `job_type` of "BackfillJob". We use this record to track the status of backfills that have been kicked off. Of particular usefulness is the "latest_heartbeat" field which allows us to detect when a backfill process dies before completing. Unfortunately the only descriptive information in that job record is the `dag_id` that it was run for. If we kicked off multiple backfills for the same dag_id ( but say with different dates/regex ) we can't tell which record in the job table maps to the particular cli command we ran. For instance if we ran these two commands: airflow backfill --task_regex task_one --start_date 2020-01-01 --end_date 2020-01-01 foobar airflow backfill --task_regex task_two --start_date 2020-02-01 --end_date 2020-02-01 foobar and we query the job table for active backfills running for our dag “foobar” sql select * from job where job_type = ‘BackfillJob' and dag_id = ‘foobar' and state = ‘running' and latest_heartbeat >= NOW() - interval '3 minutes' We would get the following results: id | dag_id | state | job_type | start_date | end_date | latest_heartbeat | executor_class | hostname | unixname 123 | foobar | running | BackfillJob | 2020-03-31 17:29:04.142715+00 | [null] | 2020-03-31 21:42:56.47259+00 | CeleryExecutor | 4ff5e55dc67b | airflow 124 | foobar | running | BackfillJob | 2020-03-31 19:49:08.111486+00 | [null] | 2020-03-31 21:42:57.63451+00 | CeleryExecutor | 4ff5e55dc67b | airflow but we have no way of knowing which one maps to which cli command that was run. We've poured over the database for what might be linked to the job ID and come up empty. We know the worker itself knows the ID because we traced queries being run from the worker against the airflow database and saw this query coming from the worker where we ran the cli command: SELECT job.id <http://job.id/> AS job_id, job.dag_id AS job_dag_id, job.state AS job_state, job.job_type AS job_job_type, job.start_date AS job_start_date, job.end_date AS job_end_date, job.latest_heartbeat AS job_latest_heartbeat, job.executor_class AS job_executor_class, job.hostname AS job_hostname, job.unixname AS job_unixname FROM job WHERE job.id <http://job.id/> = 123 LIMIT 1 Please help us understand if it’s possible to discover the job id for a backfill command that was run? Or maybe if it’s possible to capture the job id at the moment when we kick off a new backfill… Thanks for helping, Victor Ignatenkov