Dear all,

We currently run a small Ceph cluster on 2 machines and we wonder what are the theoretical max BW/IOPS we can achieve through RBD with our setup.

Here are the environment details:

- The Ceph release is an octopus 15.2.1 running on Centos 8, both machines have 180GB RAM, 72 cores, and 40 * 1.8TB SSD disks each - Regarding network we deployed two isolated 100Gb/s networks for front and back connectivity - Since all disks have the same performance, we created 1 OSD per SSD using bluestore (default setup with LVM) to reach a total of 80 OSDs (40 OSD per machine) - On top of that we have a single 2x replicated RBD pool with 2048 PGs in order to reach a global average of 50 PGs per OSD (our experiments with 100 PGs/OSD didn't provided perfomance improvement, only extra CPU consumption) - We kept default settings for all RBD images we created for benchmarks (4MB obj size, 4MB stripe width, 1 stripe) - The crush map and replication rules used are very simple (2 hosts, 40 OSDs per host with same device class and weight) - All tuning settings (caches sizing, op threads, bluestore, rocksdb options, etc.) are the default options provided with the Octopus release.

Here are the best values observed so far using both rados bench and fio with many different setup (varying amount of clients, threads, RBD images, bloc sizes from 4k to 4m, random/sequential, iodepth, etc.):

- Read BW: 24GB/s (looks like we reached the maximum network capacity of both machines here)
- Read IOPS: 600k
- Write BW: 7 GB/s
- Write IOPS: 100k

Those are simply the maximum numbers obtained regardless latency as we first want to stress the infrastructure to see what are the maximum thoughput & IOPS we can achieve. Latency care/measurements will come after.

We also have the feelings that the 2x replication of the RBD pool is a big deal with only 2 nodes in the cluster, dividing maximum speeds by more than 2. This will probably have much less impact when scaling up the cluster with new nodes.

We also noticed that at some point during recovery operations (eg. rebalancing PGs after new OSD was added into the pool) the total read/write throughput and IOPS are climbing to several GB/s and millions IOPS, so we wonder if we can achieve any better with legitimate RBD clients load.

Do you guys would like to share numbers of your setup or have any hints for potential improvements?

Thanks.

Regards,

--
Vincent Kherbache
R&D Director
Titan Datacenter
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to