theister edited a comment on issue #18501:
URL: https://github.com/apache/airflow/issues/18501#issuecomment-967255792


   We came across the same problem, but on `2.0.2`.
   
   We recently migrated from 1.10.15 and we're also regularly doing backfills 
by clearing history via the UI and making the airflow catchup re-run them, 
which puts 100s of dag runs to `running` state at the same time.
   While the reprocessing the data eventually succeeded, it completely starved 
out all other dags in the meanwhile, printing many lines of  
   `DAG XYZ already has 10 active runs, not queuing any tasks for run 
2021-06-30 00:00:00+00:00` .
   to the scheduler logs. These were printed for the for the 10 next execution 
dates where no tasks were scheduled for just yet, which corresponds to our 
scheduler setting of `max_dagruns_per_loop_to_schedule=10`.
   
   After digging into the code a little, my undertanding is that if the 
`max_active_runs` limit is hit, the 2.0.2 scheduler prints the above message, 
but returns from `_schedule_dag_run()` without actually updating the 
`last_scheduling_decision` timestamp of the DagRun (See 
https://github.com/apache/airflow/blob/2.0.2/airflow/jobs/scheduler_job.py#L1776),
 which to my understanding only happens in `update_state()`.
   
   Since the `DagRun.next_dagruns_to_examine()` method returns the next DagRuns 
to check sorted ascending by last_scheduling_decision, this will effectively 
block any other dag runs from being scheduled, as long as the max_active_runs 
limit of the dag is being hit.
   
   @leonsmith did you manage to find out if the issue is still present on 2.2.0?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to