Good question, there are likely mounts, yup... though they should be being unmounted cleanly, though perhaps not in all cases and maybe we need to retry deleting things in the gc process.
Do you know if the mesos-slave will re-schedule something for GC if it fails deletion? -- Tom Arnfeld Senior Developer // DueDil On Wednesday, Jul 8, 2015 at 7:19 pm, Vinod Kone <vinodk...@gmail.com>, wrote: Are there any special files (mounts etc) in your slave directory? The logic Mesos uses to delete a directory is likely different from the shell utility 'rm'. On Wed, Jul 8, 2015 at 11:09 AM, Tom Arnfeld <t...@duedil.com> wrote: In this instance there were three old slave directories, and there are three log lines in the mesos-slave.INFO file; I0708 11:24:52.023453 2425 slave.cpp:3499] Garbage collecting old slave 20150515-105200-84152492-5050-9915-S46 I0708 11:24:52.023923 2425 slave.cpp:3499] Garbage collecting old slave 20150217-184553-67375276-5050-18563-S74 I0708 11:24:52.023921 2428 gc.cpp:56] Scheduling '/mnt/mesos/mesos-slave/slaves/20150515-105200-84152492-5050-9915-S46' for gc 6.99999972599407days in the future I0708 11:24:52.054704 2425 slave.cpp:3499] Garbage collecting old slave 20150515-105200-84152492-5050-9915-S22 I0708 11:24:52.054723 2424 gc.cpp:56] Scheduling '/mnt/mesos/mesos-slave/slaves/20150217-184553-67375276-5050-18563-S74' for gc 6.99999937182815days in the future I0708 11:24:52.067934 2425 gc.cpp:56] Scheduling '/mnt/mesos/mesos-slave/slaves/20150515-105200-84152492-5050-9915-S22' for gc 6.99999922252444days in the future This happens right after the recovery process finishes after the slave boots up. I've looked at another slave that's currently at 99% disk capacity and the slave has been up since 27th May 2015, it also has the "Garbage collecting old slave" log lines just after boot for ~6 days. Looking a little deeper in to this slave logs; this looks like an interesting error; W0527 17:35:08.935755 1749 gc.cpp:139] Failed to delete '/mnt/mesos/mesos-slave/slaves/20150217-184553-67375276-5050-18563-S72': Directory not empty I think I actually discussed this with BenH a while back, we're running 0.21.0 on this cluster. Anyone else seen this before? Using the standard `rm` unix tool clears out the directories fine currently, running as the same user as the slave (root). -- Tom Arnfeld Senior Developer // DueDil On Wed, Jul 8, 2015 at 7:00 PM, Vinod Kone <vinodk...@gmail.com> wrote: On Wed, Jul 8, 2015 at 10:54 AM, Tom Arnfeld <t...@duedil.com> wrote: When this happens the old slave directories appear not to be tracked by the mesos GC process, and stay around indefinitely. Over time if enough full slave restarts happen (say, due to reconfiguration) the disks can be completely filled and the mesos slave won't do anything about it. This shouldn't happen. Old slave directories should be gc'ed by the slave based on their last modification time. Do you see any log lines with "Garbage collecting old slave" ?