[ceph-users] Re: Reconstructing an OSD server when the boot OS is corrupted

2024-05-02 Thread Murilo Morais
Em qui., 2 de mai. de 2024 às 06:20, Matthew Vernon escreveu: > On 24/04/2024 13:43, Bailey Allison wrote: > > > A simple ceph-volume lvm activate should get all of the OSDs back up and > > running once you install the proper packages/restore the ceph config > > file/etc., > > What's the

[ceph-users] Re: Reset health.

2024-03-22 Thread Murilo Morais
You can use the `ceph crash` interface to view/archive recent crashes. [1] To list recent crashes: ceph crash ls-new To get information about a particular crash: ceph crash info To silence a crash: ceph crash archive To silence all active crashes: ceph crash archive-all [1]

[ceph-users] Re: Increase number of PGs

2024-02-12 Thread Murilo Morais
Janne, thanks for the tip. Does the "target_max_misplaced_ratio" parameter influence the process? I would like to make the increase with as little overhead as possible. Em seg., 12 de fev. de 2024 às 11:39, Janne Johansson escreveu: > Den mån 12 feb. 2024 kl 14:12 skrev

[ceph-users] Increase number of PGs

2024-02-12 Thread Murilo Morais
Good morning and happy holidays everyone! Guys, what would be the best strategy to increase the number of PGs in a POOL that is already in production? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to

[ceph-users] [DOC] Openstack with RBD DOC update?

2024-01-24 Thread Murilo Morais
Good afternoon everybody! I have a question regarding the documentation... I was reviewing it and realized that the "vms" pool is not being used anywhere in the configs. The first mention of this pool was in commit 2eab1c1 and, in e9b13fa, the configuration section of nova.conf was removed, but

[ceph-users] cephadm - podman vs docker

2023-12-27 Thread Murilo Morais
Good morning everybody! Guys, are there any differences or limitations when using Docker instead of Podman? Context: I have a cluster with Debian 11 running Podman (3.0.1), but the iSCSI service, when restarted, the "tcmu-runner" binary is in "Z State" and the "rbd-target-api" script enters "D

[ceph-users] SSD SATA performance

2023-09-30 Thread Murilo Morais
Good morning everybody! Guys, I have 9x Kingston DC600M/1920 SSDs (SATA) in 3x DL380e, using the P420 (I still don't have an HBA to perform the exchange) in RAID 0. The device's specifications indicate that it achieves 94k/78k RAND-RW IOPS at 4K. I'm using it exclusively for VMs with RBD (I'm

[ceph-users] Re: librbd 4k read/write?

2023-08-10 Thread Murilo Morais
It makes sense. Em qui., 10 de ago. de 2023 às 16:04, Zakhar Kirpichenko escreveu: > Hi, > > You can use the following formula to roughly calculate the IOPS you can > get from a cluster: (Drive_IOPS * Number_of_Drives * 0.75) / Cluster_Size. > > For example, for 60 10K rpm SAS drives each

[ceph-users] Re: librbd 4k read/write?

2023-08-10 Thread Murilo Morais
Em qui., 10 de ago. de 2023 às 13:01, Marc escreveu: > > I have the following scenario: > > Pool RBD replication x3 > > 5 hosts with 12 SAS spinning disks each > > > > I'm using exactly the following line with FIO to test: > > fio -ioengine=libaio -direct=1 -invalidate=1 -name=test -bs=4M

[ceph-users] Re: librbd 4k read/write?

2023-08-10 Thread Murilo Morais
Em qui., 10 de ago. de 2023 às 12:47, Hans van den Bogert < hansbog...@gmail.com> escreveu: > On Thu, Aug 10, 2023, 17:36 Murilo Morais wrote: > > > Good afternoon everybody! > > > > I have the following scenario: > > Pool RBD replication x3 > &g

[ceph-users] librbd 4k read/write?

2023-08-10 Thread Murilo Morais
Good afternoon everybody! I have the following scenario: Pool RBD replication x3 5 hosts with 12 SAS spinning disks each I'm using exactly the following line with FIO to test: fio -ioengine=libaio -direct=1 -invalidate=1 -name=test -bs=4M -size=10G -iodepth=16 -rw=write -filename=./test.img If

[ceph-users] HBA or RAID-0 + BBU

2023-04-18 Thread Murilo Morais
Good evening everyone! Guys, about the P420 RAID controller, I have a question about the operation mode: What would be better: HBA or RAID-0 with BBU (active write cache)? Thanks in advance! ___ ceph-users mailing list -- ceph-users@ceph.io To

[ceph-users] RBD latency

2023-03-16 Thread Murilo Morais
Good evening everyone! Guys, what to expect latency for RBD images in a cluster with only HDD (36 HDDs)? Sometimes I see that the write latency is around 2-5 ms in some images even with very low IOPS and bandwidth while the read latency is around 0.2-0.7 ms. For a cluster with only HDD is this

[ceph-users] Difficulty with rbd-mirror on different networks.

2023-03-08 Thread Murilo Morais
Good evening everyone. I'm having trouble with rbd-mirror. In test environment I have the following scenario: DC1: public_network: 172.20.0.0/24, 192.168.0.0/24 --mon-ip 172.20.0.1 ip: 192.168.0.1 DC2: public_network: 172.21.0.0/24, 192.168.0.0/24 --mon-ip 172.21.0.1 ip 192.168.0.2 If I add

[ceph-users] Re: Problem with IO after renaming File System .data pool

2023-01-19 Thread Murilo Morais
Does anyone know what could have happened? Em seg., 16 de jan. de 2023 às 13:44, escreveu: > Good morning everyone. > > On this Thursday night we went through an accident, where they > accidentally renamed the .data pool of a File System making it instantly > inaccessible, when renaming it

[ceph-users] Problem with IO after renaming File System .data pool

2023-01-16 Thread Murilo Morais
Good morning everyone. That night we went through an accident, where they accidentally renamed the .data pool of a File System making it instantly inaccessible, when renaming it again to the correct name it was possible to mount and list the files, but could not read or write. When trying to

[ceph-users] Re: SLOW_OPS

2022-12-16 Thread Murilo Morais
. Is there no mechanism to automatically recover from this event? Em sex., 16 de dez. de 2022 às 11:20, Eugen Block escreveu: > Have you tried catching an OSD's dump_blocked_ops with cephadm? > > Zitat von Murilo Morais : > > > Eugen, thanks for answering. > > > > I under

[ceph-users] SLOW_OPS

2022-12-14 Thread Murilo Morais
Good morning everyone. Guys, today my cluster had a "problem", it was showing SLOW_OPS, when restarting the OSDs that were showing this problem everything was solved (there were VMs stuck because of this), what I'm breaking my head is to know the reason for having SLOW_OPS. In the logs I saw

[ceph-users] Re: Reduce recovery bandwidth

2022-12-12 Thread Murilo Morais
> KONSEC GmbH -⁠ make things real > Amtsgericht Stuttgart, HRB 23690 > Geschäftsführer: Andreas Mack > Im Köller 3, 70794 Filderstadt, Germany > > > On 2022-12-09 21:10, Murilo Morais wrote: > > Hi Martin, thanks for replying. > > I'm using v17.2.3. > > Em sex., 9

[ceph-users] Re: Best practice taking cluster down

2022-11-24 Thread Murilo Morais
Hi Dominique! On this list, there was recently a thread discussing the same subject. [1] You can follow SUSE's recommendations and it's a success! [2] Have a good day! [1] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/QN4GUPPZ5IZYLQ4PD4KV737L5M6DJ4CI/ [2]

[ceph-users] Re: Disable legacy msgr v1

2022-11-18 Thread Murilo Morais
Have you tried setting ms_bind_msgr1 to false? Em sex., 18 de nov. de 2022 às 14:35, Oleksiy Stashok escreveu: > Hey guys, > > Is there a way to disable the legacy msgr v1 protocol for all ceph > services? > > Thank you. > Oleksiy > ___ > ceph-users

[ceph-users] Re: change of pool size

2022-11-11 Thread Murilo Morais
Florian, good morning. You are correct, just change the size property of each pool. Ceph will perform all possible operations, just worry about the amount of free storage. Em sex., 11 de nov. de 2022 às 10:50, Florian Jonas escreveu: > Dear all, > > we are running a small cluster with about

[ceph-users] Re: Question about quorum

2022-11-04 Thread Murilo Morais
o > without losing availability. > > > On Thu, Nov 3, 2022, 2:55 PM Murilo Morais wrote: > >> Good afternoon everyone! >> >> I have a lab with 4 mons, I was testing the behavior in case a certain >> amount of hosts went offline, as soon as the second one we

[ceph-users] Question about quorum

2022-11-03 Thread Murilo Morais
Good afternoon everyone! I have a lab with 4 mons, I was testing the behavior in case a certain amount of hosts went offline, as soon as the second one went offline everything stopped. It would be interesting if there was a fifth node to ensure that, if two fall, everything will work, but why did

[ceph-users] Re: No active PG; No disk activity

2022-11-01 Thread Murilo Morais
I managed to solve this problem. To document the resolution: The firewall was blocking communication. After disabling everything related to it and restarting the machine everything went back to normal. Em ter., 1 de nov. de 2022 às 10:46, Murilo Morais escreveu: > Good morning every

[ceph-users] No active PG; No disk activity

2022-11-01 Thread Murilo Morais
Good morning everyone! Today there was an atypical situation in our Cluster where the three machines came to shut down. On powering up the cluster went up and formed quorum with no problems, but the PGs are all in Working, I don't see any disk activity on the machines. No PG is active.

[ceph-users] Re: Debug cluster warnings "CEPHADM_HOST_CHECK_FAILED", "CEPHADM_REFRESH_FAILED" etc

2022-10-24 Thread Murilo Morais
Hello Martin. Apparently cephadm is not able to resolve to `admin.ceph.`, check /etc/hosts or your DNS, try to ping and check if the IPs in `ceph orch host ls` are pinged and there is no packet loss. Try according to the documentation:

[ceph-users] Re: Grafana without presenting data from the first Host

2022-10-20 Thread Murilo Morais
host page metrics are > node-exporter's. As Marc suggested, you can also confirm it by checking > whether there are "node_*" metrics for that node in the Prometheus web UI. > > Kind Regards, > Ernesto > > > On Thu, Oct 20, 2022 at 2:52 AM Murilo Morais > wrote: >

[ceph-users] Re: Grafana without presenting data from the first Host

2022-10-20 Thread Murilo Morais
Hi Marc, thanks for replying. I already checked, nothing appears about the first Host even if I perform a direct query from the Prometheus GUI. Ernesto mentioned node-exporter, this service is running on all hosts. Em qui., 20 de out. de 2022 às 04:21, Marc escreveu: > > > > > > I'm

[ceph-users] Grafana without presenting data from the first Host

2022-10-19 Thread Murilo Morais
Good evening everyone. I'm experiencing something strange on a cluster regarding monitoring. In Grafana I can't see any data referring to the first Host, I've already tried to redeploy Grafana and Prometheus, but the first Host never appears, if I go to Dashboard -> Hosts -> Performance Detail

[ceph-users] Re: cephadm error: add-repo does not have a release file

2022-10-18 Thread Murilo Morais
AFAIK there are no repositories for Ubuntu 22.04 yet, but if I'm not mistaken there are packages compiled by Canonical for Ubuntu 22.04, try running apt install ceph-common. Em seg., 17 de out. de 2022 às 20:30, Na Na escreveu: > > > I

[ceph-users] Re: Cluster crashing when stopping some host

2022-10-14 Thread Murilo Morais
> > [0,22]p0 2022-10-12T19:19:25.675030+ > 2022-10-12T19:19:25.675030+ > >4 periodic scrub scheduled @ > > 2022-10-14T00:21:49.935082+ > > 3.1c 66 0 00 276762624 0 > > 0 10027 activ

[ceph-users] Re: Cluster crashing when stopping some host

2022-10-13 Thread Murilo Morais
you share more details? Does ceph report inactive PGs when one > node is down? Please share: > ceph osd tree > ceph osd pool ls detail > ceph osd crush rule dump > ceph pg ls-by-pool > ceph -s > > Zitat von Murilo Morais : > > > Thanks for answering. > > Marc, but

[ceph-users] Re: Cluster crashing when stopping some host

2022-10-13 Thread Murilo Morais
Thanks for answering. Marc, but there is no mechanism to prevent IO pause? At the moment I don't worry about data loss. I understand that putting it as replica x1 can work, but I need it to be x2. Em qui., 13 de out. de 2022 às 12:26, Marc escreveu: > > > > > I'm having strange behavior on a

[ceph-users] Re: Cluster crashing when stopping some host

2022-10-13 Thread Murilo Morais
I'm using Host as Failure Domain. Em qui., 13 de out. de 2022 às 11:41, Eugen Block escreveu: > What is your failure domain? If it's osd you'd have both PGs on the > same host and then no replica is available. > > Zitat von Murilo Morais : > > > Eugen,

[ceph-users] Re: Cluster crashing when stopping some host

2022-10-13 Thread Murilo Morais
have size 3 pools with min_size 2? > > Zitat von Murilo Morais : > > > Good morning everyone. > > > > I'm having strange behavior on a new cluster. > > > > I have 3 machines, two of them have the disks. We can name them like > this: > > dcs1 to dcs3. The dcs1 an

[ceph-users] Cluster crashing when stopping some host

2022-10-13 Thread Murilo Morais
Good morning everyone. I'm having strange behavior on a new cluster. I have 3 machines, two of them have the disks. We can name them like this: dcs1 to dcs3. The dcs1 and dcs2 machines contain the disks. I started bootstrapping through dcs1, added the other hosts and left mgr on dcs3 only.

[ceph-users] Re: Trying to add NVMe CT1000P2SSD8

2022-10-05 Thread Murilo Morais
I've already tested the performance. Great performance by the way, but this anomaly is occurring in the OSDs starting in an Error state. I don't know how to debug this problem. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an

[ceph-users] Re: Trying to add NVMe CT1000P2SSD8

2022-10-05 Thread Murilo Morais
Nobody? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Trying to add NVMe CT1000P2SSD8

2022-10-04 Thread Murilo Morais
Good morning people. I'm having trouble adding 4 NVMe (CT1000P2SSD8). When trying the first time I came across the error "Cannot use /dev/nvme0n1: device is rejected by filter config" caused by LVM, when commenting on the filters this error no longer appeared, Ceph manages to add the OSDs, but

[ceph-users] Re: Traffic between public and cluster network

2022-09-29 Thread Murilo Morais
correctly and your question :) > > Cheers > Boris > > Am Do., 29. Sept. 2022 um 04:11 Uhr schrieb Murilo Morais < > mur...@evocorp.com.br>: > > > Good evening everyone. > > > > I setup a cluster with three machines, each with two network interfaces, > >

[ceph-users] Traffic between public and cluster network

2022-09-28 Thread Murilo Morais
Good evening everyone. I setup a cluster with three machines, each with two network interfaces, one for the public network and one for the cluster network (172.25.50.0/24 for public and 10.10.10.0/24 for cluster). All machines see each other and are communicable in their respective networks. So

[ceph-users] Re: Low read/write rate

2022-09-28 Thread Murilo Morais
escreveu: > Den lör 24 sep. 2022 kl 23:38 skrev Murilo Morais : > > I'm relatively new to Ceph. I set up a small cluster with two hosts with > 12 > > disks each host, all 3 TB SAS 7500 RPM and two 10 Gigabit interfaces. I > > created a pool in replicated mode and configure

[ceph-users] Re: HA cluster

2022-09-28 Thread Murilo Morais
Thank you very much for the clarifications. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] HA cluster

2022-09-25 Thread Murilo Morais
Hello guys. I have a question regarding HA. I set up two hosts with cephadm, created the pools and set up an NFS, everything working so far. I turned off the second Host and the first one continued to work without problems, but if I turn off the first, the second is totally irresponsible. What

[ceph-users] Low read/write rate

2022-09-24 Thread Murilo Morais
Good evening everyone. I'm relatively new to Ceph. I set up a small cluster with two hosts with 12 disks each host, all 3 TB SAS 7500 RPM and two 10 Gigabit interfaces. I created a pool in replicated mode and configured it to use two replicas. What I'm finding strange is that, with these