[ceph-users] Re: NVMe's

2020-09-24 Thread Mark Nelson
Thanks for the info!  Interesting numbers.  Probably not 60K client IOPs/OSD then, but the tp_osd_tp threads were probably working pretty hard under the combined client/recovery workload. Mark On 9/24/20 2:49 PM, Martin Verges wrote: Hello, It was some time ago but as far as I remember

[ceph-users] Re: NVMe's

2020-09-24 Thread Martin Verges
Hello, It was some time ago but as far as I remember and found in the chat log, it was during backfill/recovery and high client workload and on Intel Xeon Silver 4110, 2.10GHz, 8C/16T Cpu. I found a screenshot in my chat history stating 775% and 722% cpu usage in htop for 2 OSDs (the server has 2

[ceph-users] Re: NVMe's

2020-09-24 Thread vitalif
Yeah but you should divide sysstat of each disk by 5. Which is Ceph's WA. 60k/5 = 12k external iops, pretty realistic. > I did not see 10 cores, but 7 cores per osd over a long period on pm1725a > disks with around 60k > IO/s according to sysstat of each disk.

[ceph-users] Re: NVMe's

2020-09-24 Thread Mark Nelson
Mind if I ask what size of IOs those where, what kind of IOs (reads/writes/sequential/random?) and what kind of cores? Mark On 9/24/20 1:43 PM, Martin Verges wrote: I did not see 10 cores, but 7 cores per osd over a long period on pm1725a disks with around 60k IO/s according to sysstat of

[ceph-users] Re: NVMe's

2020-09-24 Thread Martin Verges
I did not see 10 cores, but 7 cores per osd over a long period on pm1725a disks with around 60k IO/s according to sysstat of each disk. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.ver...@croit.io Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247

[ceph-users] Re: NVMe's

2020-09-24 Thread Mark Nelson
On 9/24/20 11:46 AM, vita...@yourcmc.ru wrote: OK, I'll retry my tests several times more. But I've never seen OSD utilize 10 cores, so... I won't believe it until I see it myself on my machine. :-)) It's better to see evidence with your own eyes of course! I tried a fresh OSD on a

[ceph-users] Re: NVMe's

2020-09-24 Thread vitalif
OK, I'll retry my tests several times more. But I've never seen OSD utilize 10 cores, so... I won't believe it until I see it myself on my machine. :-)) I tried a fresh OSD on a block ramdisk ("brd"), for example. It was eating 658% CPU and pushing only 4138 write iops...

[ceph-users] Re: NVMe's

2020-09-23 Thread Anthony D'Atri
>> With today’s networking, _maybe_ a super-dense NVMe box needs 100Gb/s where >> a less-dense probably is fine with 25Gb/s. And of course PCI lanes. >> >>

[ceph-users] Re: NVMe's

2020-09-23 Thread Maged Mokhtar
On 23/09/2020 17:58, vita...@yourcmc.ru wrote: I have no idea how you get 66k write iops with one OSD ) I've just repeated a test by creating a test pool on one NVMe OSD with 8 PGs (all pinned to the same OSD with pg-upmap). Then I ran 4x fio randwrite q128 over 4 RBD images. I got 17k

[ceph-users] Re: NVMe's

2020-09-23 Thread Mark Nelson
On 9/23/20 2:21 PM, Alexander E. Patrakov wrote: On Wed, Sep 23, 2020 at 8:12 PM Anthony D'Atri wrote: With today’s networking, _maybe_ a super-dense NVMe box needs 100Gb/s where a less-dense probably is fine with 25Gb/s. And of course PCI lanes.

[ceph-users] Re: NVMe's

2020-09-23 Thread Alexander E. Patrakov
On Wed, Sep 23, 2020 at 8:12 PM Anthony D'Atri wrote: > With today’s networking, _maybe_ a super-dense NVMe box needs 100Gb/s where a > less-dense probably is fine with 25Gb/s. And of course PCI lanes. > >

[ceph-users] Re: NVMe's

2020-09-23 Thread Brent Kennedy
Thanks for the feedback everyone! It seems we have more to look into regarding NVMe enterprise storage solutions. The workload doesn’t demand NVMe performance, so SSD seems to be the most cost effective way to handle this. The performance discussion is very interesting! Regards, Brent

[ceph-users] Re: NVMe's

2020-09-23 Thread Mark Nelson
On 9/23/20 12:18 PM, Mark Nelson wrote: On 9/23/20 10:58 AM, vita...@yourcmc.ru wrote: I have no idea how you get 66k write iops with one OSD ) I've just repeated a test by creating a test pool on one NVMe OSD with 8 PGs (all pinned to the same OSD with pg-upmap). Then I ran 4x fio randwrite

[ceph-users] Re: NVMe's

2020-09-23 Thread Mark Nelson
On 9/23/20 10:58 AM, vita...@yourcmc.ru wrote: I have no idea how you get 66k write iops with one OSD ) I've just repeated a test by creating a test pool on one NVMe OSD with 8 PGs (all pinned to the same OSD with pg-upmap). Then I ran 4x fio randwrite q128 over 4 RBD images. I got 17k iops.

[ceph-users] Re: NVMe's

2020-09-23 Thread vitalif
I have no idea how you get 66k write iops with one OSD ) I've just repeated a test by creating a test pool on one NVMe OSD with 8 PGs (all pinned to the same OSD with pg-upmap). Then I ran 4x fio randwrite q128 over 4 RBD images. I got 17k iops. OK, in fact that's not the worst result for

[ceph-users] Re: NVMe's

2020-09-23 Thread tri
I don't think you need a bucket under host for the two LVs. It's unnecessary. September 23, 2020 6:45 AM, "George Shuklin" wrote: > On 23/09/2020 10:54, Marc Roos wrote: > >>> Depends on your expected load not? I already read here numerous of times >> that osd's can not keep up with nvme's,

[ceph-users] Re: NVMe's

2020-09-23 Thread Anthony D'Atri
Apologies for not consolidating these replys. My UMA is not my friend today. > With 10 NVMe drives per node, I'm guessing that a single EPYC 7451 is > going to be CPU bound for small IO workloads (2.4c/4.8t per OSD), but > will be network bound for large IO workloads unless you are sticking >

[ceph-users] Re: NVMe's

2020-09-23 Thread Anthony D'Atri
> How they did it? You can create partitions / LVs by hand and build OSDs on them, or you can use ceph-volume lvm batch –osds-per-device > I have an idea to create a new bucket type under host, and put two LV from > each ceph osd VG into that new bucket. Rules are the same (different host),

[ceph-users] Re: NVMe's

2020-09-23 Thread Mark Nelson
On 9/23/20 8:23 AM, George Shuklin wrote: I've just finishing doing our own benchmarking, and I can say, you want to do something very unbalanced and CPU bounded. 1. Ceph consume a LOT of CPU. My peak value was around 500% CPU per ceph-osd at top-performance (see the recent thread on 'ceph

[ceph-users] Re: NVMe's

2020-09-23 Thread Marc Roos
://www.storagereview.com/review/hgst-4tb-deskstar-nas-hdd-review -Original Message- Subject: Re: [ceph-users] Re: NVMe's On 9/23/20 8:05 AM, Marc Roos wrote: >> I'm curious if you've tried octopus+ yet? > Why don't you publish results of your test cluster? You cannot expect > all new

[ceph-users] Re: NVMe's

2020-09-23 Thread George Shuklin
I've just finishing doing our own benchmarking, and I can say, you want to do something very unbalanced and CPU bounded. 1. Ceph consume a LOT of CPU. My peak value was around 500% CPU per ceph-osd at top-performance (see the recent thread on 'ceph on brd') with more realistic numbers

[ceph-users] Re: NVMe's

2020-09-23 Thread vitalif
> https://docs.google.com/spreadsheets/d/1e5eTeHdZnSizoY6AUjH0knb4jTCW7KMU4RoryLX9EHQ/edit?usp=sharing I see that in your tests Octopus delivers more than twice iops with 1 OSD. Can I ask you what's my problem then? :-) I have a 4-node Ceph cluster with 14 NVMe drives and fast CPUs

[ceph-users] Re: NVMe's

2020-09-23 Thread Marc Roos
> I'm curious if you've tried octopus+ yet?  Why don't you publish results of your test cluster? You cannot expect all new users to buy 4 servers with 40 disks, and try if the performance is ok. Get a basic cluster and start publishing results, and document changes to the test cluster.

[ceph-users] Re: NVMe's

2020-09-23 Thread Mark Nelson
On 9/23/20 5:41 AM, George Shuklin wrote: I've just finishing doing our own benchmarking, and I can say, you want to do something very unbalanced and CPU bounded. 1. Ceph consume a LOT of CPU. My peak value was around 500% CPU per ceph-osd at top-performance (see the recent thread on 'ceph

[ceph-users] Re: NVMe's

2020-09-23 Thread vitalif
Sounds like you just want to create 2 OSDs per drive? It's OK, everyone does that :) I tested Ceph with 2 OSDs per SATA SSD when comparing it to my Vitastor, Micron also tested Ceph with 2 OSDs per SSD in their PDF and so on. > On 23/09/2020 10:54, Marc Roos wrote: > >>> Depends on your

[ceph-users] Re: NVMe's

2020-09-23 Thread Stefan Kooman
On 2020-09-23 07:39, Brent Kennedy wrote: > We currently run a SSD cluster and HDD clusters and are looking at possibly > creating a cluster for NVMe storage. For spinners and SSDs, it seemed the > max recommended per osd host server was 16 OSDs ( I know it depends on the > CPUs and RAM, like 1

[ceph-users] Re: NVMe's

2020-09-23 Thread George Shuklin
On 23/09/2020 10:54, Marc Roos wrote: Depends on your expected load not? I already read here numerous of times that osd's can not keep up with nvme's, that is why people put 2 osd's on a single nvme. So on a busy node, you probably run out of cores? (But better verify this with someone that

[ceph-users] Re: NVMe's

2020-09-23 Thread George Shuklin
I've just finishing doing our own benchmarking, and I can say, you want to do something very unbalanced and CPU bounded. 1. Ceph consume a LOT of CPU. My peak value was around 500% CPU per ceph-osd at top-performance (see the recent thread on 'ceph on brd') with more realistic numbers

[ceph-users] Re: NVMe's

2020-09-23 Thread vitalif
Hi > We currently run a SSD cluster and HDD clusters and are looking at possibly > creating a cluster for NVMe storage. For spinners and SSDs, it seemed the > max recommended per osd host server was 16 OSDs ( I know it depends on the > CPUs and RAM, like 1 cpu core and 2GB memory ). What do you

[ceph-users] Re: NVMe's

2020-09-23 Thread André Gemünd
Hi Brent, > 1. If we do a jbod setup, the servers can hold 48 NVMes, if the servers > were bought with 48 cores and 100+ GB of RAM, would this make sense? Do you seriously mean 48 NVMes per server? How would you even come remotely close to supporting them with connection (to board) and network

[ceph-users] Re: NVMe's

2020-09-23 Thread Marc Roos
Depends on your expected load not? I already read here numerous of times that osd's can not keep up with nvme's, that is why people put 2 osd's on a single nvme. So on a busy node, you probably run out of cores? (But better verify this with someone that has an nvme cluster ;))