Hi all,

we need to prepare for temporary shut-downs of a part of our ceph cluster. I 
have 2 questions:

1) What is the recommended procedure to temporarily shut down a ceph fs quickly?
2) How to avoid MON store log spam overflow (on octopus 15.2.17)?

To 1: Currently, I'm thinking about:

- fs fail <fs-name>
- shut down all MDS daemons
- shut down all OSDs in that sub-cluster
- shut down MGRs and MONs in that sub-cluster
- power servers down
- mark out OSDs manually (the number will exceed the MON limit for auto-out)

- power up
- wait a bit
- do I need to mark OSDs in again or will they join automatically after manual 
out and restart (maybe just temporarily increase the MON limit at end of 
procedure above)?
- fs set <fs_name> joinable true

Is this a safe procedure? The documentation calls this a procedure for "Taking 
the cluster down rapidly for deletion or disaster recovery", neither of the two 
is our intent. We need to have a fast *reversable* procedure, because an "fs 
set down true" simply takes too long.

There will be ceph fs clients remaining up. Desired behaviour is that client-IO 
stalls until fs comes back up and then just continues as if nothing had 
happened.

To 2: We will have a sub-cluster down for an extended period of time. There 
have been cases where such a situation killed MONS due to excessive amount of 
non-essential logs accumulating in the MON store. Is this still a problem with 
15.2.17 and what can I do to reduce this problem?

Thanks for any hints/corrections/confirmations!
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to