Hello,

I am attempting to execute a workload using the KubernetesExecutor in an AWS 
EKS cluster. After a certain number of tasks start up, the pods start to take 
longer and longer to move from a Pending phase to a Running phase. The issue 
appears to be related to mounting the volumes that host the dags and logs 
folders. We start to see “FailedMount” events as the number of tasks increase.

The dags and logs folders are being mounted using PersistentVolumes and 
PersistentVolumeClaims. They are hosted on AWS EFS drives.

I have set up the PersistentVolumes in 2 ways, both with the same results.


  1.  Using the EFS ECI driver
  2.  Using a hostPath, with the drives mounted on the underlying EC2 instance.

As the workload begins to scale, the percentage of pods in Pending phase (vs 
Running phase) continues to grow. Eventually, pods spawned by the 
KubernetesPodOperator start to fail because the pod remains in the Pending 
phase for too long.

I’ve worked with AWS support and they don’t believe that the issue is related 
to the EFS drives. From the evidence I can see, I tend to agree.

Has anyone seen anything similar to this? Has anybody been able to successfully 
scale up Airflow on a K8S cluster?

Thanks,

Jim Majure| Principal Machine Learning Engineer
aurishealth.com<http://aurishealth.com/> | 150 Shoreline Dr. |  Redwood City, 
CA |  94065<webextlink://150%20Shoreline%20Dr.%20|%20 
Redwood%20City,%20CA%20|%20 94065>
(515) 829-0667

[signature_863625008]

Reply via email to