[ceph-users] Re: Ceph OSD reported Slow operations

2024-01-28 Thread Zakhar Kirpichenko
Hi, You have 67 TB of raw space available. With a replication factor of 3, which is what you seem to be using, that is ~22 TB usable space under ideal conditions. MAX AVAIL column shows the available space, taking into account the raw space, the replication factor and the CRUSH map, before the

[ceph-users] Re: Ceph OSD reported Slow operations

2024-01-28 Thread Anthony D'Atri
> > Just a continuation of this mail, Could you help me out to understand the ceph > df output. PFA the screenshot with this mail. No idea what PFA means, but attachments usually don’t make it through on mailing lists. Paste text instead. > 1. Raw storage is 180 TB The sum of OSD total

[ceph-users] Re: Ceph OSD reported Slow operations

2024-01-28 Thread V A Prabha
Hi Just a continuation of this mail, Could you help me out to understand the ceph df output. PFA the screenshot with this mail. 1. Raw storage is 180 TB 2. Stored Value is 37 TB 3. Used Value is 112 TB 4. Available Value is 67 TB 5. Pool Max Available Value is 16 TB Though the Available

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-08 Thread Zakhar Kirpichenko
Take hints from this: "544 pgs not deep-scrubbed in time". Your OSDs are unable to scrub their data in time, likely because they cannot cope with the client + scrubbing I/O. I.e. there's too much data on too few and too slow spindles. You can play with osd_deep_scrub_interval and increase the

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-08 Thread prabhav
Hi Eugen Please find the details below root@meghdootctr1:/var/log/ceph# ceph -s cluster: id: c59da971-57d1-43bd-b2b7-865d392412a5 health: HEALTH_WARN nodeep-scrub flag(s) set 544 pgs not deep-scrubbed in time services: mon: 3 daemons, quorum meghdootctr1,meghdootctr2,meghdootctr3 (age 5d) mgr:

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-06 Thread Zakhar Kirpichenko
Only client I/O, cluster recovery I/O and/or data scrubbing I/O make the cluster "busy". If you have removed client workloads and the cluster is healthy, it should be mostly idle. Simply having data sitting in the cluster but not being accessed or modified doesn't make the cluster do any work,

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-06 Thread V A Prabha
Please clarify my query. I had 700+ volumes (220 applications) running in 36 OSDs when it reported the slow operations. Due to emergency, we migrated 200+ VMs to another virtualization environment. So we have shutdown all the related VMs in our Openstack production setup running with Ceph. We

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-04 Thread Zakhar Kirpichenko
You have an IOPS budget, i.e. how much I/O your spinners can deliver. Space utilization doesn't affect it much. You can try disabling write (not read!) cache on your HDDs with sdparm (for example, sdparm -c WCE /dev/bla); in my experience this allows HDDs to deliver 50-100% more write IOPS. If

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-04 Thread V A Prabha
Now in this situation how can stabilize my production setup as you have mentioned the cluster is very busy. Is there any configuration parameter tuning will help or the only option is to reduce the applications running on the cluster. Though if I have free available storage of 1.6 TB free in each

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-04 Thread V A Prabha
Thanks for your prompt reply and that clears my doubt. 4 of the OSDs in 2 different nodes goes down daily by getting errors in multipath failure. All the four paths are going to failure mode that makes the OSD down. My query is that as the ceph cluster is overloaded with IOPS does the multipaths

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-03 Thread Janne Johansson
Den tors 2 nov. 2023 kl 23:46 skrev V A Prabha : > > Is it possible to move the OSDs safe (making the OSDs out and move the content > to other OSDs and remove it and map it fresh to other nodes which is less > loaded) > As the client feels that using 3 replicas and holding these much spare >

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-02 Thread V A Prabha
Is it possible to move the OSDs safe (making the OSDs out and move the content to other OSDs and remove it and map it fresh to other nodes which is less loaded) as I have very critical production workloads ( Government applications) ? Please guide what is the safer means to stabilize the

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-02 Thread Zakhar Kirpichenko
>1. The calculated IOPS is for the rw operation right ? Total drive IOPS, read or write. Depending on the exact drive models, it may be lower or higher than 200. I took the average for a smaller sized 7.2k rpm SAS drive. Modern drives usually deliver lower read IOPS and higher write IOPS. >2.

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-02 Thread V A Prabha
Thanks for your prompt reply .. But the query is 1.The calculated IOPS is for the rw operation right ? 2. Cluster is very busy? Is there any misconfiguration or missing tuning paramater that makes the cluster busy? 3. Nodes are not balanced? you mean to say that the count of OSDs in each server

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-02 Thread Zakhar Kirpichenko
Sure, it's 36 OSDs at 200 IOPS each (tops, likely lower), I assume size=3 replication so 1/3 of the total performance, and some 30%-ish OSD overhead. (36 x 200) * 1/3 * 0.7 = 1680. That's how many IOPS you can realistically expect from your cluster. You get more than that, but the cluster is very

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-01 Thread V A Prabha
Can you please elaborate your identifications and the statement . On November 2, 2023 at 9:40 AM Zakhar Kirpichenko wrote: > I'm afraid you're simply hitting the I/O limits of your disks. > > /Z > > On Thu, 2 Nov 2023 at 03:40, V A Prabha < prab...@cdac.in > >

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-01 Thread Zakhar Kirpichenko
I'm afraid you're simply hitting the I/O limits of your disks. /Z On Thu, 2 Nov 2023 at 03:40, V A Prabha wrote: > Hi Eugen > Please find the details below > > > root@meghdootctr1:/var/log/ceph# ceph -s > cluster: > id: c59da971-57d1-43bd-b2b7-865d392412a5 > health: HEALTH_WARN >

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-01 Thread V A Prabha
Hi Eugen Please find the details below root@meghdootctr1:/var/log/ceph# ceph -s cluster: id: c59da971-57d1-43bd-b2b7-865d392412a5 health: HEALTH_WARN nodeep-scrub flag(s) set 544 pgs not deep-scrubbed in time services: mon: 3 daemons, quorum meghdootctr1,meghdootctr2,meghdootctr3 (age 5d)

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-01 Thread Eugen Block
Hi, for starters please add more cluster details like 'ceph status', 'ceph versions', 'ceph osd df tree'. Increasing the to 10G was the right thing to do, you don't get far with 1G with real cluster load. How are the OSDs configured (HDD only, SSD only or HDD with rocksdb on SSD)? How is