Re: [ceph-users] OSD node reinstallation

2018-10-29 Thread David Turner
Set noout, reinstall the OS without going the OSDs (including any journal partitions and maintaining any dmcrypt keys if you have encryption), install ceph, make sure the ceph.conf file is correct,zip start OSDs, unset noout once they're back up and in. All of the data the OSD needs to start is on

Re: [ceph-users] reducing min_size on erasure coded pool may allow recovery ?

2018-10-29 Thread David Turner
min_size should be at least k+1 for EC. There are times to use k for emergencies like you had. I would suggest seeing it back to 3 once your back to healthy. As far as why you needed to reduce min_size, my guess would be that recovery would have happened as long as k copies were up. Were the PG's

[ceph-users] reducing min_size on erasure coded pool may allow recovery ?

2018-10-29 Thread Chad W Seys
Hi all, Recently our cluster lost a drive and a node (3 drives) at the same time. Our erasure coded pools are all k2m2, so if all is working correctly no data is lost. However, there were 4 PGs that stayed "incomplete" until I finally took the suggestion in 'ceph health detail' to reduce

[ceph-users] OSD node reinstallation

2018-10-29 Thread Luiz Gustavo Tonello
Hi list, I have a situation that I need to reinstall the O.S. of a single node in my OSD cluster. This node has 4 OSDs configured, each one has ~4 TB used. The way that I'm thinking to proceed is to put OSD down (one each time), stop the OSD, reinstall the O.S., and finally add the OSDs again.

Re: [ceph-users] Ceph cluster uses substantially more disk space after rebalancing

2018-10-29 Thread Виталий Филиппов
Is there a way to force OSDs to remove old data? Hi After I recreated one OSD + increased pg count of my erasure-coded (2+1) pool (which was way too low, only 100 for 9 osds) the cluster started to eat additional disk space. First I thought that was caused by the moved PGs using

[ceph-users] Ceph cluster uses substantially more disk space after rebalancing

2018-10-29 Thread Виталий Филиппов
Hi After I recreated one OSD + increased pg count of my erasure-coded (2+1) pool (which was way too low, only 100 for 9 osds) the cluster started to eat additional disk space. First I thought that was caused by the moved PGs using additional space during unfinished backfills. I pinned most of

[ceph-users] ceph-deploy with a specified osd ID

2018-10-29 Thread Jin Mao
Gents, My cluster had a gap in the OSD sequence numbers at certain point. Basically, because of missing osd auth del/rm" in a previous disk replacement task for osd.17, a new osd.34 was created. It did not really bother me until recently when I tried to replace all smaller disks to bigger disks.

Re: [ceph-users] librados3

2018-10-29 Thread Jason Dillaman
On Mon, Oct 29, 2018 at 7:48 AM Wido den Hollander wrote: > On 10/29/18 12:42 PM, kefu chai wrote: > > + ceph-user for more inputs in hope to get more inputs from librados > > and librbd 's C++ interfaces. > > > > On Wed, Oct 24, 2018 at 1:34 AM Jason Dillaman wrote: > >> > >> On Tue, Oct 23,

Re: [ceph-users] librados3

2018-10-29 Thread Wido den Hollander
On 10/29/18 12:42 PM, kefu chai wrote: > + ceph-user for more inputs in hope to get more inputs from librados > and librbd 's C++ interfaces. > > On Wed, Oct 24, 2018 at 1:34 AM Jason Dillaman wrote: >> >> On Tue, Oct 23, 2018 at 11:38 AM kefu chai wrote: >>> >>> we plan to introduce some

Re: [ceph-users] librados3

2018-10-29 Thread kefu chai
+ ceph-user for more inputs in hope to get more inputs from librados and librbd 's C++ interfaces. On Wed, Oct 24, 2018 at 1:34 AM Jason Dillaman wrote: > > On Tue, Oct 23, 2018 at 11:38 AM kefu chai wrote: > > > > we plan to introduce some non-backward-compatible changes[0] in > > librados in

Re: [ceph-users] ceph-mds failure replaying journal

2018-10-29 Thread Yan, Zheng
cephfs is recoverable. Just set mds_wipe_sessions to 1. After mds recovers, set it back to 0 and flush journal (ceph daemon mds.x flush journal) On Mon, Oct 29, 2018 at 7:13 PM Jon Morby (Fido) wrote: > I've experimented and whilst the downgrade looks to be working, you end up > with errors

Re: [ceph-users] ceph-mds failure replaying journal

2018-10-29 Thread Jon Morby (Fido)
I've experimented and whilst the downgrade looks to be working, you end up with errors regarding unsupported feature "mimic" amongst others 2018-10-29 10:51:20.652047 7f6f1b9f5080 -1 ERROR: on disk data includes unsupported features: compat={},rocompat={},incompat={10=mimic ondisk layou so I

Re: [ceph-users] ceph-mds failure replaying journal

2018-10-29 Thread Yan, Zheng
please try again debug_mds=10 and send log to me Regards Yan, Zheng On Mon, Oct 29, 2018 at 6:30 PM Jon Morby (Fido) wrote: > fyi, downgrading to 13.2.1 doesn't seem to have fixed the issue either :( > > --- end dump of recent events --- > 2018-10-29 10:27:50.440 7feb58b43700 -1 *** Caught

Re: [ceph-users] ceph-mds failure replaying journal

2018-10-29 Thread Jon Morby (Fido)
fyi, downgrading to 13.2.1 doesn't seem to have fixed the issue either :( --- end dump of recent events --- 2018-10-29 10:27:50.440 7feb58b43700 -1 *** Caught signal (Aborted) ** in thread 7feb58b43700 thread_name:md_log_replay ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77)

Re: [ceph-users] ceph-mds failure replaying journal

2018-10-29 Thread Yan, Zheng
On Mon, Oct 29, 2018 at 5:25 PM Jon Morby (Fido) wrote: > Hi > > Ideally we'd like to undo the whole accidental upgrade to 13.x and ensure > that ceph-deploy doesn't do another major release upgrade without a lot of > warnings > > Either way, I'm currently getting errors that 13.2.1 isn't

Re: [ceph-users] ceph-mds failure replaying journal

2018-10-29 Thread Jon Morby (Fido)
Hi Ideally we'd like to undo the whole accidental upgrade to 13.x and ensure that ceph-deploy doesn't do another major release upgrade without a lot of warnings Either way, I'm currently getting errors that 13.2.1 isn't available / shaman is offline / etc What's the best / recommended way

Re: [ceph-users] ceph-mds failure replaying journal

2018-10-29 Thread Yan, Zheng
We backported a wrong patch to 13.2.2. downgrade ceph to 13.2.1, then run 'ceph mds repaired fido_fs:1" . Sorry for the trouble Yan, Zheng On Mon, Oct 29, 2018 at 7:48 AM Jon Morby wrote: > > We accidentally found ourselves upgraded from 12.2.8 to 13.2.2 after a > ceph-deploy install went