I've noticed the disk on my mesos slaves filling up when running tasks that generate large amounts of data in their sandbox directories (~2-5GB). The tasks don't last very long, and I can see that the mesos GC process is trying to delete them, but failing. Here are some logs;
------------------------------ W0906 02:56:00.256515 1434 gc.cpp:139] Failed to delete '/var/lib/mesos-slave/slaves/20140724-124232-33820844-5050-18432-4/frameworks/20140724-201017-50598060-5050-16139-0314/executors/0c5ce2f8-3564-11e4-a014-22000a48a50a/runs/d3e1342f-89d8-4930-a9ed-35e33920779c': Directory not empty W0906 02:56:00.258396 1434 gc.cpp:139] Failed to delete '/var/lib/mesos-slave/slaves/20140724-124232-33820844-5050-18432-4/frameworks/20140724-201017-50598060-5050-16139-0314': Directory not empty W0906 02:56:00.259904 1434 gc.cpp:139] Failed to delete '/var/lib/mesos-slave/slaves/20140724-124232-33820844-5050-18432-4/frameworks/20140724-201017-50598060-5050-16139-0314/executors/0c5ce2f8-3564-11e4-a014-22000a48a50a': Directory not empty ------------------------------ If I try and manually remove the directory mentioned, it works fine. Is this a known issue, or should I do a little more debugging? I've not tried to reproduce it under specific conditions yet. As a side note, should mesos perhaps have some kind of retry mechanism for GC? Also, will GC still run for an executor if the slave restarts after an executor terminates but before the GC process runs? Tom.