I've noticed the disk on my mesos slaves filling up when running tasks that
generate large amounts of data in their sandbox directories (~2-5GB). The
tasks don't last very long, and I can see that the mesos GC process is
trying to delete them, but failing. Here are some logs;

------------------------------

W0906 02:56:00.256515  1434 gc.cpp:139] Failed to delete
'/var/lib/mesos-slave/slaves/20140724-124232-33820844-5050-18432-4/frameworks/20140724-201017-50598060-5050-16139-0314/executors/0c5ce2f8-3564-11e4-a014-22000a48a50a/runs/d3e1342f-89d8-4930-a9ed-35e33920779c':
Directory not empty
W0906 02:56:00.258396  1434 gc.cpp:139] Failed to delete
'/var/lib/mesos-slave/slaves/20140724-124232-33820844-5050-18432-4/frameworks/20140724-201017-50598060-5050-16139-0314':
Directory not empty
W0906 02:56:00.259904  1434 gc.cpp:139] Failed to delete
'/var/lib/mesos-slave/slaves/20140724-124232-33820844-5050-18432-4/frameworks/20140724-201017-50598060-5050-16139-0314/executors/0c5ce2f8-3564-11e4-a014-22000a48a50a':
Directory not empty

------------------------------

If I try and manually remove the directory mentioned, it works fine. Is
this a known issue, or should I do a little more debugging? I've not tried
to reproduce it under specific conditions yet.

As a side note, should mesos perhaps have some kind of retry mechanism for
GC? Also, will GC still run for an executor if the slave restarts after an
executor terminates but before the GC process runs?

Tom.

Reply via email to