Re: Cleaning out old mesos-slave sandbox directories

Tom Arnfeld Wed, 08 Jul 2015 11:22:01 -0700

Good question, there are likely mounts, yup... though they should be being 
unmounted cleanly, though perhaps not in all cases and maybe we need to retry 
deleting things in the gc process.





Do you know if the mesos-slave will re-schedule something for GC if it fails 
deletion?



--


Tom Arnfeld

Senior Developer // DueDil






On Wednesday, Jul 8, 2015 at 7:19 pm, Vinod Kone <vinodk...@gmail.com>, wrote:
Are there any special files (mounts etc) in your slave directory? The logic 
Mesos uses to delete a directory is likely different from the shell utility 
'rm'.

On Wed, Jul 8, 2015 at 11:09 AM, Tom Arnfeld <t...@duedil.com> wrote:

In this instance there were three old slave directories, and there are three 
log lines in the mesos-slave.INFO file;





I0708 11:24:52.023453  2425 slave.cpp:3499] Garbage collecting old slave 
20150515-105200-84152492-5050-9915-S46

I0708 11:24:52.023923  2425 slave.cpp:3499] Garbage collecting old slave 
20150217-184553-67375276-5050-18563-S74

I0708 11:24:52.023921  2428 gc.cpp:56] Scheduling 
'/mnt/mesos/mesos-slave/slaves/20150515-105200-84152492-5050-9915-S46' for gc 
6.99999972599407days in the future

I0708 11:24:52.054704  2425 slave.cpp:3499] Garbage collecting old slave 
20150515-105200-84152492-5050-9915-S22

I0708 11:24:52.054723  2424 gc.cpp:56] Scheduling 
'/mnt/mesos/mesos-slave/slaves/20150217-184553-67375276-5050-18563-S74' for gc 
6.99999937182815days in the future

I0708 11:24:52.067934  2425 gc.cpp:56] Scheduling 
'/mnt/mesos/mesos-slave/slaves/20150515-105200-84152492-5050-9915-S22' for gc 
6.99999922252444days in the future




This happens right after the recovery process finishes after the slave boots 
up. I've looked at another slave that's currently at 99% disk capacity and the 
slave has been up since 27th May 2015, it also has the "Garbage collecting old 
slave" log lines just after boot for ~6 days. Looking a little deeper in to 
this slave logs; this looks like an interesting error;





W0527 17:35:08.935755  1749 gc.cpp:139] Failed to delete 
'/mnt/mesos/mesos-slave/slaves/20150217-184553-67375276-5050-18563-S72': 
Directory not empty




I think I actually discussed this with BenH a while back, we're running 0.21.0 
on this cluster.




Anyone else seen this before? Using the standard `rm` unix tool clears out the 
directories fine currently, running as the same user as the slave (root).






--


Tom Arnfeld

Senior Developer // DueDil







On Wed, Jul 8, 2015 at 7:00 PM, Vinod Kone <vinodk...@gmail.com> wrote:





On Wed, Jul 8, 2015 at 10:54 AM, Tom Arnfeld <t...@duedil.com> wrote:

When this happens the old slave directories appear not to be tracked by the 
mesos GC process, and stay around indefinitely. Over time if enough full slave 
restarts happen (say, due to reconfiguration) the disks can be completely 
filled and the mesos slave won't do anything about it.







This shouldn't happen. Old slave directories should be gc'ed by the slave based 
on their last modification time. Do you see any log lines with  "Garbage 
collecting old slave" ?

Re: Cleaning out old mesos-slave sandbox directories

Reply via email to