[ceph-users] Re: Viability of NVMeOF/TCP for VMWare

2024-07-01 Thread Maged Mokhtar
On 28/06/2024 17:59, Frédéric Nass wrote: We came to the same conclusions as Alexander when we studied replacing Ceph's iSCSI implementation with Ceph's NFS-Ganesha implementation: HA was not working. During failovers, vmkernel would fail with messages like this: 2023-01-14T09:39:27.200Z

[ceph-users] Re: Ceph RBD, MySQL write IOPs - what is possible?

2024-06-09 Thread Maged Mokhtar
With good hardware and correct configuration, an all flash cluster should give: approx 1-2K write iops per thread (0.5-1 ms  latency) approx 2-5K read iops per thread (0.2-0.5 ms  latency) This is dependent on quality of drives and cpu/frequency but independent on number of drives or cores.

[ceph-users] Re: How to handle incomplete data after rbd import-diff failure?

2024-05-01 Thread Maged Mokhtar
On 01/05/2024 16:12, Satoru Takeuchi wrote: I confirmed that incomplete data is left on `rbd import-diff` failure. I guess that this data is the part of snapshot. Could someone answer me the following questions? Q1. Is it safe to use the RBD image (e.g. client I/O and snapshot management) even

[ceph-users] Re: Best practice and expected benefits of using separate WAL and DB devices with Bluestore

2024-04-23 Thread Maged Mokhtar
On 19/04/2024 11:02, Niklaus Hofer wrote: Dear all We have an HDD ceph cluster that could do with some more IOPS. One solution we are considering is installing NVMe SSDs into the storage nodes and using them as WAL- and/or DB devices for the Bluestore OSDs. However, we have some questions

[ceph-users] Re: Performance improvement suggestion

2024-03-04 Thread Maged Mokhtar
On 04/03/2024 15:37, Frank Schilder wrote: Fast write enabled would mean that the primary OSD sends #size copies to the entire active set (including itself) in parallel and sends an ACK to the client as soon as min_size ACKs have been received from the peers (including itself). In this way,

[ceph-users] Re: Performance improvement suggestion

2024-03-04 Thread Maged Mokhtar
On 04/03/2024 13:35, Marc wrote: Fast write enabled would mean that the primary OSD sends #size copies to the entire active set (including itself) in parallel and sends an ACK to the client as soon as min_size ACKs have been received from the peers (including itself). In this way, one can

[ceph-users] Re: Ceph & iSCSI

2024-02-29 Thread Maged Mokhtar
On 29/02/2024 11:05, Dmitry Melekhov wrote: 27.02.2024 13:39, Maged Mokhtar пишет: You can look at PetaSAN project www.petasan.org We support iSCSI on Ceph /maged Ubuntu 20.04 in 2024? We are using Ceph 17 (Quincy) which has upstream support for 20.04 LTS (focal). There are recent

[ceph-users] Re: Ceph & iSCSI

2024-02-27 Thread Maged Mokhtar
You can look at PetaSAN project www.petasan.org We support iSCSI on Ceph /maged On 27/02/2024 05:22, Michael Worsham wrote: I was reading on the Ceph site that iSCSI is no longer under active development since November 2022. Why is that? https://docs.ceph.com/en/latest/rbd/iscsi-overview/

[ceph-users] Re: PSA: Long Standing Debian/Ubuntu build performance issue (fixed, backports in progress)

2024-02-09 Thread Maged Mokhtar
Hi Mark, Thanks a lot for highlighting this issue...I have 2 questions: 1) In the patch comments: /"but we fail to populate this setting down when building external projects. this is important when it comes to the projects which is critical to the performance. RocksDB is one of them."/ Do

[ceph-users] Re: XFS on top of RBD, overhead

2024-02-02 Thread Maged Mokhtar
On 02/02/2024 16:41, Ruben Vestergaard wrote: Hi group, Today I conducted a small experiment to test an assumption of mine, namely that Ceph incurs a substantial network overhead when doing many small files. One RBD was created, and on top of that an XFS containing 1.6 M files, each with

[ceph-users] Re: Performance impact of Heterogeneous environment

2024-01-17 Thread Maged Mokhtar
Very informative article you did Mark. IMHO if you find yourself with very high per-OSD core count, it may be logical to just pack/add more nvmes per host, you'd be getting the best price per performance and capacity. /Maged On 17/01/2024 22:00, Mark Nelson wrote: It's a little tricky.  In

[ceph-users] Re: librbd 4k read/write?

2023-08-13 Thread Maged Mokhtar
On 12/08/2023 13:04, Marc wrote: To allow for faster linear reads and writes, please create a file, /etc/udev/rules.d/80-rbd.rules, with the following contents (assuming that the VM sees the RBD as /dev/sda): KERNEL=="sda", ENV{DEVTYPE}=="disk", ACTION=="add|change",

[ceph-users] Re: librbd 4k read/write?

2023-08-11 Thread Maged Mokhtar
On 10/08/2023 22:04, Zakhar Kirpichenko wrote: Hi, You can use the following formula to roughly calculate the IOPS you can get from a cluster: (Drive_IOPS * Number_of_Drives * 0.75) / Cluster_Size. For example, for 60 10K rpm SAS drives each capable of 200 4K IOPS and a replicated pool with

[ceph-users] Re: Ceph iSCSI GW is too slow when compared with Raw RBD performance

2023-06-23 Thread Maged Mokhtar
On 23/06/2023 04:18, Work Ceph wrote: Hello guys, We have a Ceph cluster that runs just fine with Ceph Octopus; we use RBD for some workloads, RadosGW (via S3) for others, and iSCSI for some Windows clients. We started noticing some unexpected performance issues with iSCSI. I mean, an SSD

[ceph-users] Re: Ceph iSCSI GW not working with VMware VMFS and Windows Clustered Storage Volumes (CSV)

2023-06-21 Thread Maged Mokhtar
to be supported. Can I export an RBD image via iSCSI gateway using only one portal via GwCli? @Maged Mokhtar, I am not sure I follow. Do you guys have an iSCSI implementation that we can use to somehow replace the default iSCSI server in the default Ceph iSCSI Gateway? I didn't quite understand

[ceph-users] Re: Ceph iSCSI GW not working with VMware VMFS and Windows Clustered Storage Volumes (CSV)

2023-06-19 Thread Maged Mokhtar
Windows Clustered Shared Volumes and Failover Clustering require the support of clustered persistence reservations by the block device to coordinate access by multiple hosts. The default iSCSI implementation in Ceph does not support this, you can use the iSCSI implementation in PetaSAN

[ceph-users] Re: architecture help (iscsi, rbd, backups?)

2023-04-28 Thread Maged Mokhtar
Hello Angelo You can try PetaSAN www.petasan.org We support scale out iscsi with Ceph and is actively developed. /Maged On 27/04/2023 23:05, Angelo Höngens wrote: Hey guys and girls, I'm working on a project to build storage for one of our departments, and I want to ask you guys and girls

[ceph-users] Re: Understanding rbd objects, with snapshots

2022-10-24 Thread Maged Mokhtar
On 18/10/2022 01:24, Chris Dunlop wrote: Hi, Is there anywhere that describes exactly how rbd data (including snapshots) are stored within a pool? I can see how a rbd broadly stores its data in rados objects in the pool, although the object map is opaque. But once an rbd snap is created

[ceph-users] Re: iscsi deprecation

2022-10-07 Thread Maged Mokhtar
You can try PetaSAN www.petasan.org We are open source solution on top of Ceph. we provide scalable active/active iSCSI which supports VMWare VAAI and Microsoft clustered shared volumes for hyper-v clustering. Cheers /maged On 30/09/2022 19:36, Filipe Mendes wrote: Hello! I'm

[ceph-users] Re: how to speed up hundreds of millions small files read base on cephfs?

2022-09-01 Thread Maged Mokhtar
Hi, experts, We are using cephfs(15.2.*) with kernel mount on our production environment. And these days when we do massive read from cluster(multi processes), ceph health always report slow ops for some osds(build with hdd(8TB) which using ssd as db cache). our cluster have more read than

[ceph-users] Re: 50% performance drop after disk failure

2022-07-09 Thread Maged Mokhtar
you can further check the disk % util/busy load to confirm it is disk load related On 09/07/2022 15:56, Maged Mokhtar wrote: if you have recovery io, then the system is not done recovering from the failed disk or from some other failure, for example from the other OSDs than flapped

[ceph-users] Re: 50% performance drop after disk failure

2022-07-09 Thread Maged Mokhtar
if you have recovery io, then the system is not done recovering from the failed disk or from some other failure, for example from the other OSDs than flapped as a result of recovery load. if so you may want to lower the recovery speed via osd_max_backfills osd_recovery_max_active

[ceph-users] Re: use ceph rbd for windows cluster "scsi-3 persistent reservation"

2022-06-23 Thread Maged Mokhtar
the stock tcmu-runner rbd backend does not support this. You can use PetaSAN: www.petasan.org we use special LIO rbd backstore which talks directly to rbd and does support iSCSI 3 PR and passes the Windows Failover cluster tests. /maged On 22/06/2022 15:40, farhad kh wrote: I need a

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Maged Mokhtar
Hello Cephers, i too am for LTS releases or for some kind of middle ground like longer release cycle and/or have even numbered releases designated for production like before. We all use LTS releases for the base OS when running Ceph, yet in reality we depend much more on the Ceph code than

[ceph-users] Re: NFS Ganesha Active Active Question

2021-10-31 Thread Maged Mokhtar
, 2021, at 4:51 AM, Maged Mokhtar wrote:  On 31/10/2021 05:29, Xiaolong Jiang wrote: Hi Experts. I am a bit confused about ganesha active-active setup. We can set up multiple ganesha servers on top of cephfs and clients can point to different ganesh server to serve the traffic. that can

[ceph-users] Re: NFS Ganesha Active Active Question

2021-10-31 Thread Maged Mokhtar
On 31/10/2021 05:29, Xiaolong Jiang wrote: Hi Experts. I am a bit confused about ganesha active-active setup. We can set up multiple ganesha servers on top of cephfs and clients can point to different ganesh server to serve the traffic. that can scale out the traffic. From client side, is

[ceph-users] Re: Expose rgw using consul or service discovery

2021-10-22 Thread Maged Mokhtar
In PetaSAN we use Consul to provide a service mesh for running services active/active over Ceph. For rgw, we use nginx to load balance rgw gateways, the nginx themselves run in an active/active ha setup so they do not become a bottleneck as you pointed out with the haproxy setup. How

[ceph-users] Re: Expose rgw using consul or service discovery

2021-10-22 Thread Maged Mokhtar
In PetaSAN we use Consul to provide a service mesh for running services active/active over Ceph. For rgw, we use nginx to load balance rgw gateways, the nginx themselves run in an active/active ha setup so they do not become a bottleneck as you pointed out with the haproxy setup. /Maged

[ceph-users] Re: cephfs vs rbd

2021-10-09 Thread Maged Mokhtar
-roughly how large is the expanded untared folder, and roughly how many files ? -also roughly, what cluster throughput and bandwidth do you see when untaring the file, you could observe this from ceph status -is the cluster running on the same client machine ? hdd/ssd ? /Maged On

[ceph-users] Re: performance between ceph-osd and crimson-osd

2021-08-19 Thread Maged Mokhtar
Can you run it with 4k block size , 1 thread. ( Default is 4M and 16 threads) $ rados bench -p rbd 10 write -b 4096 -t 1 --no-cleanup On 19/08/2021 04:22, 신희원 / 학생 / 컴퓨터공학부 wrote: Hi, I measured the performance of ceph-osd and crimson-osd with same single core affinity. I checked IOPS,

[ceph-users] Re: Ceph failover claster

2021-04-12 Thread Maged Mokhtar
Hello Varkonyi, Windows clustering requires the use of SCSI 3 clustered persistent reservations, to support this with Ceph you could use our distribution PetaSAN: www.petasan.org which supports this and passes the Windows clustering tests. /Maged On 12/04/2021 10:28, Várkonyi János

[ceph-users] Re: Question about delayed write IOs, octopus, mixed storage

2021-03-12 Thread Maged Mokhtar
On 12/03/2021 17:28, Philip Brown wrote: "First it is not a good idea to mix SSD/HDD OSDs in the same pool," Sorry for not being explicit. I used the cephadm/ceph orch facilities and told them "go set up all my disks". SO they automatically set up the SSDs to be WAL devices or whatever. I

[ceph-users] Re: Question about delayed write IOs, octopus, mixed storage

2021-03-12 Thread Maged Mokhtar
t, but google search is being difficult without a more specific search term. - Original Message - From: "Maged Mokhtar" To: "Philip Brown" Cc: "ceph-users" Sent: Friday, March 12, 2021 8:04:06 AM Subject: Re: [ceph-users] Question about delayed write IOs, octopus

[ceph-users] Re: Proxmox+Ceph Benchmark 2020

2020-10-13 Thread Maged Mokhtar
Very nice and useful document. One thing is not clear for me, the fio parameters in appendix 5: --numjobs=<1|4> --iodepths=<1|32> it is not clear if/when the iodepth was set to 32, was it used with all tests with numjobs=4 ? or was it: --numjobs=<1|4> --iodepths=1 /maged On 13/10/2020

[ceph-users] Re: Ceph iSCSI Performance

2020-10-06 Thread Maged Mokhtar
You can try PetaSAN  www.petasan.org we use rbd backend by SUSE. It works out of the box. /Maged On 06/10/2020 19:49, dhils...@performair.com wrote: Mark; Are you suggesting some other means to configure iSCSI targets with Ceph? If so, how do configure for non-tcmu? The iSCSI clients are

[ceph-users] Re: Write access delay after OSD & Mon lost

2020-10-06 Thread Maged Mokhtar
IF an OSD is lost, it will be detected after osd heartbeat grace = 20   + osd heartbeat interval = 5 ie 25 sec by default, which is what you see. During this time client io will block, after this the OSD is flagged as down and a new OSD map is issued which the client will use to re-direct the

[ceph-users] Re: ceph iscsi latency too high for esxi?

2020-10-04 Thread Maged Mokhtar
It is a load issue. Your combined load: client io, recovery, scrub is higher that what your cluster can handle. Whereas some ceph commands can block when things are very busy, VMWare iSCSI is less tolerant but it is not the problem. If you have charts, look at the metric for disk %

[ceph-users] Re: NVMe's

2020-09-23 Thread Maged Mokhtar
On 23/09/2020 17:58, vita...@yourcmc.ru wrote: I have no idea how you get 66k write iops with one OSD ) I've just repeated a test by creating a test pool on one NVMe OSD with 8 PGs (all pinned to the same OSD with pg-upmap). Then I ran 4x fio randwrite q128 over 4 RBD images. I got 17k

[ceph-users] Re: Benchmark WAL/DB on SSD and HDD for RGW RBD CephFS

2020-09-20 Thread Maged Mokhtar
...@horebdata.cn *From:* Maged Mokhtar <mailto:mmokh...@petasan.org> *Date:* 2020-09-18 18:20 *To:* vitalif <mailto:vita...@yourcmc.ru>; huxiaoyu <mailto:huxia...@horebdata.cn>; ceph-users <mailto:ceph-users@ceph.io> *Subject:* Re: [ceph-users] Re: Benchma

[ceph-users] Re: Benchmark WAL/DB on SSD and HDD for RGW RBD CephFS

2020-09-18 Thread Maged Mokhtar
dm-writecache works using a high and low watermarks, set at 45 and 50%. All writes land in cache, once cache fills to the high watermark backfilling to the slow device starts and stops when reaching the low watermark. Backfilling uses b-tree with LRU blocks and tries merge blocks to reduce

[ceph-users] Re: Benchmark WAL/DB on SSD and HDD for RGW RBD CephFS

2020-09-17 Thread Maged Mokhtar
On 17/09/2020 19:21, vita...@yourcmc.ru wrote: RBD in fact doesn't benefit much from the WAL/DB partition alone because Bluestore never does more writes per second than HDD can do on average (it flushes every 32 writes to the HDD). For RBD, the best thing is bcache. rbd will benefit: for

[ceph-users] Re: ceph/rados performace sync vs async

2020-07-18 Thread Maged Mokhtar
On 18/07/2020 00:05, Daniel Mezentsev wrote: Hi All,  I started a small project related to metrics collection and processing, Ceph was chosen as a storage backend. Decided to use rados directly, to avoid any additional layers. I got a very simple client - it works fine, but performance is

[ceph-users] Re: Poor Windows performance on ceph RBD.

2020-07-13 Thread Maged Mokhtar
On 13/07/2020 10:43, Frank Schilder wrote: To anyone who is following this thread, we found a possible explanation for (some of) our observations. If someone is following this, they probably want the possible explanation and not the knowledge of you having the possible explanation. So you

[ceph-users] Ganesha rados recovery on NFS 3

2020-06-15 Thread Maged Mokhtar
Hello all, can the NFS ganesha rados recovery for multi headed active/active setup work with NFS 3 or it requires NFS 4/4.1 specifics ? Thanks for any help /Maged ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to

[ceph-users] Re: Recommendation for decent write latency performance from HDDs

2020-04-12 Thread Maged Mokhtar
huxia...@horebdata.cn *From:* Maged Mokhtar <mailto:mmokh...@petasan.org> *Date:* 2020-04-12 21:34 *To:* huxia...@horebdata.cn <mailto:huxia...@horebdata.cn>; Reed Dier <mailto:reed.d...@focusvq.com>; jesper <mailto:jes...@krogh.cc> *CC:* ceph-users &l

[ceph-users] Re: Recommendation for decent write latency performance from HDDs

2020-04-12 Thread Maged Mokhtar
in a power failure. If you wish, you can install PetaSAN to  test cache performance yourself. /Maged huxia...@horebdata.cn *From:* Maged Mokhtar <mailto:mmokh...@petasan.org> *Date:* 2020-04-12

[ceph-users] Re: Recommendation for decent write latency performance from HDDs

2020-04-12 Thread Maged Mokhtar
On 12/04/2020 18:10, huxia...@horebdata.cn wrote: Dear Maged Mokhtar, It is very interesting to know that your experiment shows dm-writecache would be better than other alternatives. I have two questions: yes much better. 1  can one cache device serve multiple HDDs? I know bcache can do

[ceph-users] Re: Recommendation for decent write latency performance from HDDs

2020-04-12 Thread Maged Mokhtar
On 10/04/2020 23:17, Reed Dier wrote: Going to resurrect this thread to provide another option: LVM-cache, ie putting a cache device in-front of the bluestore-LVM LV. I only mention this because I noticed it in the SUSE documentation for SES6 (based on Nautilus) here:

[ceph-users] Re: multi-node NFS Ganesha + libcephfs caching

2020-03-24 Thread Maged Mokhtar
On 24/03/2020 16:48, Maged Mokhtar wrote: On 24/03/2020 15:14, Daniel Gryniewicz wrote: On 3/24/20 8:19 AM, Maged Mokhtar wrote: On 24/03/2020 13:35, Daniel Gryniewicz wrote: On 3/23/20 4:31 PM, Maged Mokhtar wrote: On 23/03/2020 20:50, Jeff Layton wrote: On Mon, 2020-03-23 at 15:49

[ceph-users] Re: multi-node NFS Ganesha + libcephfs caching

2020-03-24 Thread Maged Mokhtar
On 24/03/2020 15:14, Daniel Gryniewicz wrote: On 3/24/20 8:19 AM, Maged Mokhtar wrote: On 24/03/2020 13:35, Daniel Gryniewicz wrote: On 3/23/20 4:31 PM, Maged Mokhtar wrote: On 23/03/2020 20:50, Jeff Layton wrote: On Mon, 2020-03-23 at 15:49 +0200, Maged Mokhtar wrote: Hello all

[ceph-users] Re: multi-node NFS Ganesha + libcephfs caching

2020-03-24 Thread Maged Mokhtar
On 24/03/2020 13:35, Daniel Gryniewicz wrote: On 3/23/20 4:31 PM, Maged Mokhtar wrote: On 23/03/2020 20:50, Jeff Layton wrote: On Mon, 2020-03-23 at 15:49 +0200, Maged Mokhtar wrote: Hello all, For multi-node NFS Ganesha over CephFS, is it OK to leave libcephfs write caching

[ceph-users] Re: multi-node NFS Ganesha + libcephfs caching

2020-03-23 Thread Maged Mokhtar
On 23/03/2020 20:50, Jeff Layton wrote: On Mon, 2020-03-23 at 15:49 +0200, Maged Mokhtar wrote: Hello all, For multi-node NFS Ganesha over CephFS, is it OK to leave libcephfs write caching on, or should it be configured off for failover ? You can do libcephfs write caching, as the caps

[ceph-users] multi-node NFS Ganesha + libcephfs caching

2020-03-23 Thread Maged Mokhtar
Hello all, For multi-node NFS Ganesha over CephFS, is it OK to leave libcephfs write caching on, or should it be configured off for failover ? Cheers /Maged ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to

[ceph-users] Re: iSCSI write performance

2019-10-25 Thread Maged Mokhtar
Just to clarify, it is better to separate the different performance cases: 1- regular io performance ( iops / throughput ), this should be good. 2- vmotion within datastores managed by Ceph: this will be good, as xcopy will be used. 3. vmotion between Ceph datastore and an external

[ceph-users] Re: iSCSI write performance

2019-10-25 Thread Maged Mokhtar
. On 25/10/2019 10:28, Maged Mokhtar wrote: For vmotion speed, check "emulate_3pc" attribute on the LIO target. If 0 (default), VMWare will issue io in 64KB blocks which gives low speed. if set to 1  this will trigger VMWare to use vaai extended copy, which activates LIO's xcopy functiona

[ceph-users] Re: iSCSI write performance

2019-10-25 Thread Maged Mokhtar
For vmotion speed, check "emulate_3pc" attribute on the LIO target. If 0 (default), VMWare will issue io in 64KB blocks which gives low speed. if set to 1  this will trigger VMWare to use vaai extended copy, which activates LIO's xcopy functionality which uses 512KB block sizes by default. We

[ceph-users] Re: rados bench performance in nautilus

2019-09-24 Thread Maged Mokhtar
On 24/09/2019 10:25, Marc Roos wrote: > The intent of this change is to increase iops on bluestore, it was implemented in 14.2.4 but it is a > general bluestore issue not specific to Nautilus. I am confused. Is it not like this that an increase in iops on bluestore = increase in overall

[ceph-users] Re: rados bench performance in nautilus

2019-09-23 Thread Maged Mokhtar
On 23/09/2019 08:27, 徐蕴 wrote: Hi ceph experts, I deployed Nautilus (v14.2.4) and Luminous (v12.2.11) on the same hardware, and made a rough performance comparison. The result seems Luminous is much better, which is unexpected. My setup: 3 servers, each has 3 HDD OSDs, 1 SSD as DB, two

[ceph-users] Re: CephFS+NFS For VMWare

2019-09-05 Thread Maged Mokhtar
d can also be achieved via Veeam backups. /Maged On 02/07/2018 14:36, Maged Mokhtar wrote: Hi Nick, With iSCSI we reach over 150 MB/s vmotion for single vm, 1 GB/s for 7-8 vm migrations. Since these are 64KB block sizes, latency/iops is a large factor, you need either controllers with write