[ceph-users] Luminous OSD can not be up

2019-05-21 Thread huxia...@horebdata.cn
Hi, Folks, I just encountered an OSD being down and can not be up again. Below atttached is the log messages. Anyone can tell what is wrong with the OSD? and what should i do? thanks in advance, Samuel *** # tail -500

Re: [ceph-users] Nautilus, k+m erasure coding a profile vs size+min_size

2019-05-21 Thread Christian Wuerdig
The simple answer is because k+1 is the default min_size for EC pools. min_size means that the pool will still accept writes if that many failure domains are still available. If you set min_size to k then you have entered the dangerous territory that if you loose another failure domain (OSD or

[ceph-users] Failed Disk simulation question

2019-05-21 Thread Alex Litvak
Hello cephers, I know that there was similar question posted 5 years ago. However the answer was inconclusive for me. I installed a new Nautilus 14.2.1 cluster and started pre-production testing. I followed RedHat document and simulated a soft disk failure by # echo 1 >

[ceph-users] CephFS object mapping.

2019-05-21 Thread Robert LeBlanc
I'm at a new job working with Ceph again and am excited to back in the community! I can't find any documentation to support this, so please help me understand if I got this right. I've got a Jewel cluster with CephFS and we have an inconsistent PG. All copies of the object are zero size, but the

Re: [ceph-users] Large OMAP Objects in default.rgw.log pool

2019-05-21 Thread mr. non non
Hi, Thank you so much for sharing your case. 2 weeks ago, one of my users purged old swift objects with custom script manually but didn't use object expiry feature. This might be the case. I will leave heath_warn message if it has no impact. Regards, Arnondh

Re: [ceph-users] Major ceph disaster

2019-05-21 Thread Wido den Hollander
On 5/21/19 4:48 PM, Kevin Flöh wrote: > Hi, > > we gave up on the incomplete pgs since we do not have enough complete > shards to restore them. What is the procedure to get rid of these pgs? > You need to start with marking the OSDs as 'lost' and then you can force_create_pg to get the PGs

Re: [ceph-users] Major ceph disaster

2019-05-21 Thread Kevin Flöh
Hi, we gave up on the incomplete pgs since we do not have enough complete shards to restore them. What is the procedure to get rid of these pgs? regards, Kevin On 20.05.19 9:22 vorm., Kevin Flöh wrote: Hi Frederic, we do not have access to the original OSDs. We exported the remaining

Re: [ceph-users] Nautilus, k+m erasure coding a profile vs size+min_size

2019-05-21 Thread Igor Podlesny
On Tue, 21 May 2019 at 19:32, Yoann Moulin wrote: > > >> I am doing some tests with Nautilus and cephfs on erasure coding pool. [...] > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/034867.html > > Oh thanks, I missed that thread, make sense. I agree with some comment that > it

[ceph-users] Rados Gateway 13.2.4 keystone related issue for multipart copy

2019-05-21 Thread susernamb
Setup: Ceph version: 13.2.4 OpenStack release: Rocky We have Rados GW setup with keystone integration. Integeration seems to be working fine with a strange issue with multipart copy operations. Test: Using the test program at

Re: [ceph-users] Nautilus, k+m erasure coding a profile vs size+min_size

2019-05-21 Thread Yoann Moulin
>> I am doing some tests with Nautilus and cephfs on erasure coding pool. >> >> I noticed something strange between k+m in my erasure profile and >> size+min_size in the pool created: >> >>> test@icadmin004:~$ ceph osd erasure-code-profile get ecpool-4-2 >>> crush-device-class= >>>

Re: [ceph-users] How to fix this? session lost, hunting for new mon, session established, io error

2019-05-21 Thread Marc Roos
I am still stuck with this situation, and do not want to restart(reset) this host. I tried bringing down the eth connected to the client network for a while, but after bringing it up, I am getting the same messages -Original Message- From: Marc Roos Sent: dinsdag 21 mei 2019 11:42

Re: [ceph-users] Nautilus, k+m erasure coding a profile vs size+min_size

2019-05-21 Thread Eugen Block
Hi, this question comes up regularly and is been discussed just now: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/034867.html Regards, Eugen Zitat von Yoann Moulin : Dear all, I am doing some tests with Nautilus and cephfs on erasure coding pool. I noticed something

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Charles Alva
Got it. Thanks for the explanation, Jason! Kind regards, Charles Alva Sent from Gmail Mobile On Tue, May 21, 2019 at 5:16 PM Jason Dillaman wrote: > On Tue, May 21, 2019 at 12:03 PM Charles Alva > wrote: > > > > Hi Jason, > > > > Should we disable fstrim services inside VM which runs on top

[ceph-users] Nautilus, k+m erasure coding a profile vs size+min_size

2019-05-21 Thread Yoann Moulin
Dear all, I am doing some tests with Nautilus and cephfs on erasure coding pool. I noticed something strange between k+m in my erasure profile and size+min_size in the pool created: > test@icadmin004:~$ ceph osd erasure-code-profile get ecpool-4-2 > crush-device-class= >

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Jason Dillaman
On Tue, May 21, 2019 at 12:03 PM Charles Alva wrote: > > Hi Jason, > > Should we disable fstrim services inside VM which runs on top of RBD? It has a potential to be a thundering herd issue if you have lots of VMs all issuing discards all at the same time and your RBD images do not have

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Charles Alva
Hi Jason, Should we disable fstrim services inside VM which runs on top of RBD? I recall Ubuntu OS has weekly fstrim cronjob enabled by default, while we have to enable fstrim service manually on Debian and CentOS. Kind regards, Charles Alva Sent from Gmail Mobile On Tue, May 21, 2019, 4:49

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Jason Dillaman
On Tue, May 21, 2019 at 11:28 AM Marc Schöchlin wrote: > > Hello Jason, > > Am 20.05.19 um 23:49 schrieb Jason Dillaman: > > On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin wrote: > > Hello cephers, > > we have a few systems which utilize a rbd-bd map/mount to get access to a rbd > volume. >

[ceph-users] How to fix this? session lost, hunting for new mon, session established, io error

2019-05-21 Thread Marc Roos
I have this on a cephfs client, I had ceph common on 12.2.11, and upgraded to 12.2.12 while having this error. They are writing here [0] you need to upgrade kernel and it is fixed in 12.2.2 [@~]# uname -a Linux mail03 3.10.0-957.5.1.el7.x86_6 [Tue May 21 11:23:26 2019] libceph: mon2

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Marc Schöchlin
Hello Jason, Am 20.05.19 um 23:49 schrieb Jason Dillaman: > On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin wrote: >> Hello cephers, >> >> we have a few systems which utilize a rbd-bd map/mount to get access to a >> rbd volume. >> (This problem seems to be related to "[ceph-users] Slow requests

Re: [ceph-users] ansible 2.8 for Nautilus

2019-05-21 Thread Torben Hørup
epel-testing has a ansible 2.8 package /Torben On 21.05.2019 03:14, solarflow99 wrote: > Does anyone know the necessary steps to install ansible 2.8 in rhel7? I'm > assuming most people are doing it with pip? > > ___ > ceph-users mailing list >

[ceph-users] Cephfs client evicted, how to unmount the filesystem on the client?

2019-05-21 Thread Marc Roos
[@ceph]# ps -aux | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 12527 0.0 0.0 123520 932 pts/1D+ 09:26 0:00 umount /home/mail-archive root 14549 0.2 0.0 0 0 ?D09:29 0:09 [kworker/0:0] root 23350 0.0

Re: [ceph-users] Is a not active mds doing something?

2019-05-21 Thread Marc Roos
I have not configured anything for the msd except this [mds] # 100k+ files in 2 folders mds bal fragment size max = 12 # maybe for nfs-ganesha problems? # http://docs.ceph.com/docs/master/cephfs/eviction/ mds_session_blacklist_on_timeout = false mds_session_blacklist_on_evict = false

Re: [ceph-users] Large OMAP Objects in default.rgw.log pool

2019-05-21 Thread Magnus Grönlund
Den tis 21 maj 2019 kl 02:12 skrev mr. non non : > Does anyone have this issue before? As research, many people have issue > with rgw.index which related to small small number of index sharding (too > many objects per index). > I also check on this thread >

Re: [ceph-users] Is a not active mds doing something?

2019-05-21 Thread Eugen Block
Hi Marc, have you configured the other MDS to be standby-replay for the active MDS? I have three MDS servers, one is active, the second is active-standby and the third just standby. If the active fails, the second takes over within seconds. This is what I have in my ceph.conf: [mds.]

Re: [ceph-users] Default min_size value for EC pools

2019-05-21 Thread Frank Schilder
Hi Poul, maybe we misunderstood each other here or I'm misunderstanding something. My HA comment was not on PGs becoming active/inactive or data loss. As far as I understand the discussions, the OSD flapping itself may be caused by the 2-member HA group, because the OSDs keep marking each

[ceph-users] Is a not active mds doing something?

2019-05-21 Thread Marc Roos
Should a not active mds be doing something??? When I restarted the not active mds.c, My client io on the fs_data pool disappeared. services: mon: 3 daemons, quorum a,b,c mgr: c(active), standbys: a, b mds: cephfs-1/1/1 up {0=a=up:active}, 1 up:standby osd: 32 osds: 32 up,

Re: [ceph-users] cephfs causing high load on vm, taking down 15 min later another cephfs vm

2019-05-21 Thread Marc Roos
No, but even if, I never had any issues when running multiple scrubs. -Original Message- From: EDH - Manuel Rios Fernandez [mailto:mrios...@easydatahost.com] Sent: dinsdag 21 mei 2019 10:03 To: Marc Roos; 'ceph-users' Subject: RE: [ceph-users] cephfs causing high load on vm, taking

Re: [ceph-users] cephfs causing high load on vm, taking down 15 min later another cephfs vm

2019-05-21 Thread EDH - Manuel Rios Fernandez
Hi Marc Is there any scrub / deepscrub running in the affected OSDs? Best Regards, Manuel -Mensaje original- De: ceph-users En nombre de Marc Roos Enviado el: martes, 21 de mayo de 2019 10:01 Para: ceph-users ; Marc Roos Asunto: Re: [ceph-users] cephfs causing high load on vm, taking

Re: [ceph-users] cephfs causing high load on vm, taking down 15 min later another cephfs vm

2019-05-21 Thread Marc Roos
I have evicted all client connections and have still high load on osd's And ceph osd pool stats shows still client activity? pool fs_data id 20 client io 565KiB/s rd, 120op/s rd, 0op/s wr -Original Message- From: Marc Roos Sent: dinsdag 21 mei 2019 9:51 To:

Re: [ceph-users] cephfs causing high load on vm, taking down 15 min later another cephfs vm

2019-05-21 Thread Marc Roos
I have got this today again? I cannot unmount the filesystem and looks like some osd's are having 100% cpu utilization? -Original Message- From: Marc Roos Sent: maandag 20 mei 2019 12:42 To: ceph-users Subject: [ceph-users] cephfs causing high load on vm, taking down 15 min

Re: [ceph-users] Default min_size value for EC pools

2019-05-21 Thread Paul Emmerich
No, there is no split brain problem even with size/mine_size 2/1. A PG will not go active if it doesn't have the latest data because all other OSDs that might have seen writes are currently offline. That's what the history_ignore_les_bounds option effectively does: it tells ceph to take a PG