[ceph-users] Re: slow operation observed for _collection_list

2021-11-11 Thread Boris Behrens
Hi, are you sure this can be "solved" via offline compactation? I had a crashed OSD yesterday which was added to the cluster a couple hours ago and it was still in den process of syncing in. @Igor, did you manage to fix the problem or find a workaround. Am Do., 11. Nov. 2021 um 09:23 Uhr schrieb

[ceph-users] Re: Question if WAL/block.db partition will benefit us

2021-11-11 Thread Boris Behrens
Now I finally know what kind of data are stored in the RockzDB. Didn't find it in the documentation. This sounds like a horrible SPoF. How can you recover from it? Purge the OSD, wipe the disk and readd it? All flash cluster is sadly not an option for our s3, as it is just too large and we just bo

[ceph-users] Re: allocate_bluefs_freespace failed to allocate

2021-11-11 Thread mhnx
I have 10 nodes and I use; CephFS, RBD and RGW clients and all of my clients are 14.2.16 Nautilus. My clients, MONs, OSDs are on the same servers. I have constant usage: 50-300MiB/s rd, 15-30k op/s rd --- 100-300MiB/s wr, 1-4 op/s wr. With the allocator issue it's highly possible to get slow ops an

[ceph-users] Re: allocate_bluefs_freespace failed to allocate

2021-11-11 Thread Konstantin Shalygin
Hi, Just try to upgrade to last Nautilus Many things with allocator and collections was fixed on last nau releases k > On 11 Nov 2021, at 13:15, mhnx wrote: > > I have 10 nodes and I use; CephFS, RBD and RGW clients and all of my > clients are 14.2.16 Nautilus. > My clients, MONs, OSDs are on

[ceph-users] Re: snaptrim blocks io on ceph pacific even on fast NVMEs

2021-11-11 Thread Christoph Adomeit
I have 15 Nvme 6.4 TB and 4200 Pgs. This has historical reasons since the same cluster once had 100 spinning drives and at that ancient times it was not possible to shrink the number of pgs in an existing cluster. Do you think it is a good idea for snaptrim performance to reduce the number of P

[ceph-users] Re: snaptrim blocks io on ceph pacific even on fast NVMEs

2021-11-11 Thread Eugen Block
Since the PGs will be splitted when you add new OSDs you don't need to merge them beforehand, your PG per OSD ratio will improve (if you have too many PGs/OSD at the moment). After they have been remapped and the rebalancing is done you can check how the PGs/OSD ratio is and then decide whe

[ceph-users] Pacific: parallel PG reads?

2021-11-11 Thread Zakhar Kirpichenko
Hi, I'm still trying to combat really bad read performance from HDD-backed replicated pools, which is under 100 MB/s most of the time with 1 thread and QD=1. I don't quite understand why the reads are that slow, i.e. much slower than a single HDD, but do understand that Ceph clients read a PG from

[ceph-users] Re: Pacific: parallel PG reads?

2021-11-11 Thread Janne Johansson
Den tors 11 nov. 2021 kl 13:54 skrev Zakhar Kirpichenko : > I'm still trying to combat really bad read performance from HDD-backed > replicated pools, which is under 100 MB/s most of the time with 1 thread > and QD=1. I don't quite understand why the reads are that slow, i.e. much (doing a single-

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-11 Thread Sage Weil
Hi Manuel, Before giving up and putting in an off switch, I'd like to understand why it is taking as long as it is for the PGs to go active. Would you consider enabling debug_osd=10 and debug_ms=1 on your OSDs, and debug_mon=10 + debug_ms=1 on the mons, and reproducing this (without the patch app

[ceph-users] Re: Pacific: parallel PG reads?

2021-11-11 Thread Zakhar Kirpichenko
Hi, On Thu, Nov 11, 2021 at 3:26 PM Janne Johansson wrote: > Den tors 11 nov. 2021 kl 13:54 skrev Zakhar Kirpichenko >: > > I'm still trying to combat really bad read performance from HDD-backed > > replicated pools, which is under 100 MB/s most of the time with 1 thread > > and QD=1. I don't q

[ceph-users] Re: Question if WAL/block.db partition will benefit us

2021-11-11 Thread Сергей Процун
Yeah . Wipe the disk, but do not remove it from ceph crush as it would result in re-balancing. Then recreate osd and let it re-join the cluster. чт, 11 лист. 2021, 11:05 користувач Boris Behrens пише: > Now I finally know what kind of data are stored in the RockzDB. Didn't > find it in the docum

[ceph-users] Re: Pacific: parallel PG reads?

2021-11-11 Thread Patrick Donnelly
On Thu, Nov 11, 2021 at 7:55 AM Zakhar Kirpichenko wrote: > > Hi, > > I'm still trying to combat really bad read performance from HDD-backed > replicated pools, which is under 100 MB/s most of the time with 1 thread > and QD=1. I don't quite understand why the reads are that slow, i.e. much > slow

[ceph-users] 回复: Pacific: parallel PG reads?

2021-11-11 Thread 胡 玮文
Hi Zakhar, If you are using RBD, you may be interested in the striping feature. It works like RAID0 and can read from multiple object at once for sequential read requests. https://docs.ceph.com/en/latest/man/8/rbd/#striping Weiwen Hu 从 Windows 版邮件

[ceph-users] Re: Pacific: parallel PG reads?

2021-11-11 Thread Zakhar Kirpichenko
Hi, This is a good suggestion. Unfortunately, I've already tried striping the RBD images, and it didn't provide much of an effect. I.e. I striped an image, stripe count 2 and size 2 MB, and it performed almost exactly the same as a non-striped image. Z On Thu, Nov 11, 2021 at 8:24 PM 胡 玮文 wrote

[ceph-users] Re: Pacific: parallel PG reads?

2021-11-11 Thread Zakhar Kirpichenko
Hi, This is a valid point. Unfortunately, under some conditions, such as one client reading with 1 thread and low queue depth, the read performance leaves much to be desired. I'm wondering if there's a way to improve the performance without changing to much faster storage drives. Z On Thu, Nov 1

[ceph-users] Re: Question if WAL/block.db partition will benefit us

2021-11-11 Thread Anthony D'Atri
>> it in the documentation. >> This sounds like a horrible SPoF. How can you recover from it? Purge the >> OSD, wipe the disk and readd it? >> All flash cluster is sadly not an option for our s3, as it is just too >> large and we just bought around 60x 8TB Disks (in the last couple of >> months).

[ceph-users] Re: High cephfs MDS latency and CPU load

2021-11-11 Thread Patrick Donnelly
Thanks Andras, I've left a comment on the ticket. Let's continue the discussion there. On Mon, Nov 8, 2021 at 12:00 PM Andras Pataki wrote: > > Currently this issue creates a painful problem for us - having gone from > a fairly smoothly working cephfs setup to one that comes to a screeching > ha

[ceph-users] Re: Question if WAL/block.db partition will benefit us

2021-11-11 Thread Mark Nelson
On 11/11/21 1:09 PM, Anthony D'Atri wrote: it in the documentation. This sounds like a horrible SPoF. How can you recover from it? Purge the OSD, wipe the disk and readd it? All flash cluster is sadly not an option for our s3, as it is just too large and we just bought around 60x 8TB Disks (in t

[ceph-users] Re: Question if WAL/block.db partition will benefit us

2021-11-11 Thread Anthony D'Atri
> > It's absolutely important to think about the use case. For most RGW cases I > generally agree with you. For something like HPC scratch storage you might > have the opposite case where 3DWPD might be at the edge of what's tolerable. > Many years ago I worked for a supercomputing institu

[ceph-users] Re: Pacific: parallel PG reads?

2021-11-11 Thread Gregory Farnum
On Thu, Nov 11, 2021 at 4:54 AM Zakhar Kirpichenko wrote: > > Hi, > > I'm still trying to combat really bad read performance from HDD-backed > replicated pools, which is under 100 MB/s most of the time with 1 thread > and QD=1. I don't quite understand why the reads are that slow, i.e. much > slow

[ceph-users] IO500 testing on CephFS 14.2.22

2021-11-11 Thread huxia...@horebdata.cn
Dear Cephers, I am running IO500 testing on CephFS 3-node cluster with version Nautilus 14.2.22, and during the tests, we observed a lot OSD down and pgremapping, and looking at OSD logs, we see the followings: How can we avoid such instability or issues? any suggestions or comments are highly