[ceph-users] Using centraliced management configuration drops some unrecognized config option

2019-05-14 Thread EDH - Manuel Rios Fernandez
Hi We're moving our config to centralized management configuration with "ceph config set" and with the minimal ceph.conf in all nodes. Several options from ceph are not allowed. Why? ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable) ceph config set osd

Re: [ceph-users] Lost OSD from PCIe error, recovered, to restore OSD process

2019-05-14 Thread Bob R
Does 'ceph-volume lvm list' show it? If so you can try to activate it with 'ceph-volume lvm activate 122 74b01ec2--124d--427d--9812--e437f90261d4' Bob On Tue, May 14, 2019 at 7:35 AM Tarek Zegar wrote: > Someone nuked and OSD that had 1 replica PGs. They accidentally did echo 1 > >

Re: [ceph-users] Rolling upgrade fails with flag norebalance with background IO [EXT]

2019-05-14 Thread Tarek Zegar
https://github.com/ceph/ceph-ansible/issues/3961 <--- created ticket Thanks Tarek From: Matthew Vernon To: Tarek Zegar , solarflo...@gmail.com Cc: ceph-users@lists.ceph.com Date: 05/14/2019 04:41 AM Subject:[EXTERNAL] Re: [ceph-users] Rolling upgrade fails with flag

[ceph-users] ceph -s finds 4 pools but ceph osd lspools says no pool which is the expected answer

2019-05-14 Thread Rainer Krienke
Hello, for a fresh setup ceph cluster I see a strange difference in the number of existing pools in the output of ceph -s and what I know that should be there: no pools at all. I set up a fresh Nautilus cluster with 144 OSDs on 9 hosts. Just to play around I created a pool named rbd with $ ceph

[ceph-users] Health Cron Script

2019-05-14 Thread Georgios Dimitrakakis
Hello, I am wondering if there are people out there that still use "old fashion" CRON scripts to check Ceph's health, monitor and receive email alerts. If there are do you mind sharing your implementation? Probably something similar to this:

Re: [ceph-users] ceph nautilus deep-scrub health error

2019-05-14 Thread Brett Chancellor
You can increase your scrub intervals. osd deep scrub interval osd scrub max interval On Tue, May 14, 2019 at 7:00 AM EDH - Manuel Rios Fernandez < mrios...@easydatahost.com> wrote: > Hi Muthu > > > > We found the same issue near 2000 pgs not deep-scrubbed in time. > > > > We’re manually force

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Konstantin Shalygin
peering does not seem to be blocked anymore. But still there is no recovery going on. Is there anything else we can try? Try to reduce min_size for problem pool as 'health detail' suggested: `ceph osd pool set ec31 min_size 2`. k ___

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Dan van der Ster
On Tue, May 14, 2019 at 5:13 PM Kevin Flöh wrote: > > ok, so now we see at least a diffrence in the recovery state: > > "recovery_state": [ > { > "name": "Started/Primary/Peering/Incomplete", > "enter_time": "2019-05-14 14:15:15.650517", >

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh
ok, so now we see at least a diffrence in the recovery state:     "recovery_state": [     {     "name": "Started/Primary/Peering/Incomplete",     "enter_time": "2019-05-14 14:15:15.650517",     "comment": "not enough complete instances of this PG"     },     {

[ceph-users] Lost OSD from PCIe error, recovered, to restore OSD process

2019-05-14 Thread Tarek Zegar
Someone nuked and OSD that had 1 replica PGs. They accidentally did echo 1 > /sys/block/nvme0n1/device/device/remove We got it back doing a echo 1 > /sys/bus/pci/rescan However, it reenumerated as a different drive number (guess we didn't have udev rules) They restored the LVM volume

Re: [ceph-users] ceph nautilus deep-scrub health error

2019-05-14 Thread EDH - Manuel Rios Fernandez
Hi Muthu We found the same issue near 2000 pgs not deep-scrubbed in time. We’re manually force scrubbing with : ceph health detail | grep -i not | awk '{print $2}' | while read i; do ceph pg deep-scrub ${i}; done It launch near 20-30 pgs to be deep-scrubbed. I think you can

[ceph-users] ceph nautilus deep-scrub health error

2019-05-14 Thread nokia ceph
Hi Team, After upgrading from Luminous to Nautilus , we see 654 pgs not deep-scrubbed in time error in ceph status . How can we disable this flag? . In our setup we disable deep-scrubbing for performance issues. Thanks, Muthu ___ ceph-users mailing

Re: [ceph-users] ceph mimic and samba vfs_ceph

2019-05-14 Thread Ansgar Jazdzewski
hi, i was able to compile samba 4.10.2 using the mimic-headerfiles and it works fine so far. now we are loking forward to do some real load tests. Have a nice one, Ansgar Am Fr., 10. Mai 2019 um 13:33 Uhr schrieb Ansgar Jazdzewski : > > thanks, > > i will try to "backport" this to ubuntu 16.04

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Dan van der Ster
On Tue, May 14, 2019 at 10:59 AM Kevin Flöh wrote: > > > On 14.05.19 10:08 vorm., Dan van der Ster wrote: > > On Tue, May 14, 2019 at 10:02 AM Kevin Flöh wrote: > > On 13.05.19 10:51 nachm., Lionel Bouton wrote: > > Le 13/05/2019 à 16:20, Kevin Flöh a écrit : > > Dear ceph experts, > > [...] We

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh
On 14.05.19 10:08 vorm., Dan van der Ster wrote: On Tue, May 14, 2019 at 10:02 AM Kevin Flöh wrote: On 13.05.19 10:51 nachm., Lionel Bouton wrote: Le 13/05/2019 à 16:20, Kevin Flöh a écrit : Dear ceph experts, [...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...] Here

Re: [ceph-users] Rolling upgrade fails with flag norebalance with background IO [EXT]

2019-05-14 Thread Matthew Vernon
On 14/05/2019 00:36, Tarek Zegar wrote: > It's not just mimic to nautilus > I confirmed with luminous to mimic >   > They are checking for clean pgs with flags set, they should unset flags, > then check. Set flags again, move on to next osd I think I'm inclined to agree that "norebalance" is

Re: [ceph-users] Ceph MGR CRASH : balancer module

2019-05-14 Thread EDH - Manuel Rios Fernandez
We can confirm that Balancer module works smooth in 14.2.1. We’re balancing with bytes and pg. Now all osd are 100% balanced. De: ceph-users En nombre de xie.xing...@zte.com.cn Enviado el: martes, 14 de mayo de 2019 9:53 Para: tze...@us.ibm.com CC: ceph-users@lists.ceph.com Asunto:

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Dan van der Ster
On Tue, May 14, 2019 at 10:02 AM Kevin Flöh wrote: > > On 13.05.19 10:51 nachm., Lionel Bouton wrote: > > Le 13/05/2019 à 16:20, Kevin Flöh a écrit : > >> Dear ceph experts, > >> > >> [...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...] > >> Here is what happened: One osd

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh
On 13.05.19 11:21 nachm., Dan van der Ster wrote: Presumably the 2 OSDs you marked as lost were hosting those incomplete PGs? It would be useful to double confirm that: check with `ceph pg query` and `ceph pg dump`. (If so, this is why the ignore history les thing isn't helping; you don't have

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh
On 13.05.19 10:51 nachm., Lionel Bouton wrote: Le 13/05/2019 à 16:20, Kevin Flöh a écrit : Dear ceph experts, [...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...] Here is what happened: One osd daemon could not be started and therefore we decided to mark the osd as lost

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-14 Thread Stefan Kooman
Quoting Frank Schilder (fr...@dtu.dk): If at all possible I would: Upgrade to 13.2.5 (there have been quite a few MDS fixes since 13.2.2). Use more recent kernels on the clients. Below settings for [mds] might help with trimming (you might already have changed mds_log_max_segments to 128

Re: [ceph-users] Ceph MGR CRASH : balancer module

2019-05-14 Thread xie.xingguo
Should be fixed by https://github.com/ceph/ceph/pull/27225 You can simply upgrade to v14.2.1 to get rid of it, or you can do 'ceph balancer off' to temporarily disable automatic balancing... 原始邮件 发件人:TarekZegar 收件人:ceph-users@lists.ceph.com ; 日 期 :2019年05月14日 01:53 主 题

Re: [ceph-users] Slow requests from bluestore osds

2019-05-14 Thread Stefan Kooman
Quoting Marc Schöchlin (m...@256bit.org): > Out new setup is now: > (12.2.10 on Ubuntu 16.04) > > [osd] > osd deep scrub interval = 2592000 > osd scrub begin hour = 19 > osd scrub end hour = 6 > osd scrub load threshold = 6 > osd scrub sleep = 0.3 > osd snap trim sleep = 0.4 > pg max concurrent