Hi all,

We are planning for a new pool to store our dataset using CephFS. These data 
are almost read-only (but not guaranteed) and consist of a lot of small files. 
Each node in our cluster has 1 * 1T SSD and 2 * 6T HDD, and we will deploy 
about 10 such nodes. We aim at getting the highest read throughput.

If we just use a replicated pool of size 3 on SSD, we should get the best 
performance, however, that only leave us 1/3 of usable SSD space. And EC pools 
are not friendly to such small object read workload, I think.

Now I’m evaluating a mixed SSD and HDD replication strategy. Ideally, I want 3 
data replications, each on a different host (fail domain). 1 of them on SSD, 
the other 2 on HDD. And normally every read request is directed to SSD. So, if 
every SSD OSD is up, I’d expect the same read throughout as the all SSD 
deployment.

I’ve read the documents and did some tests. Here is the crush rule I’m testing 
with:

rule mixed_replicated_rule {
        id 3
        type replicated
        min_size 1
        max_size 10
        step take default class ssd
        step chooseleaf firstn 1 type host
        step emit
        step take default class hdd
        step chooseleaf firstn -1 type host
        step emit
}

Now I have the following conclusions, but I’m not very sure:
* The first OSD produced by crush will be the primary OSD (at least if I don’t 
change the “primary affinity”). So, the above rule is guaranteed to map SSD OSD 
as primary in pg. And every read request will read from SSD if it is up.
* It is currently not possible to enforce SSD and HDD OSD to be chosen from 
different hosts. So, if I want to ensure data availability even if 2 hosts 
fail, I need to choose 1 SSD and 3 HDD OSD. That means setting the replication 
size to 4, instead of the ideal value 3, on the pool using the above crush rule.

Am I correct about the above statements? How would this work from your 
experience? Thanks.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to