[ceph-users] Re: rbd mirror between clusters with private "public" network

2022-04-26 Thread Arthur Outhenin-Chalandre
Hi Tony, On 4/26/22 05:13, Tony Liu wrote: > I understand that, for rbd mirror to work, the rbd mirror service requires the > connectivity to all nodes from both cluster. > > In my case, for security purpose, the "public" network is actually a private network, > which is not routable to external.

[ceph-users] Re: RGW: max number of shards per bucket index

2022-04-26 Thread Cory Snyder
Thanks for your input, Casey! Your response seems to align with my mental model. It makes sense that choosing the number of bucket index shards involves a tradeoff between write parallelism and bucket listing performance. Your point about the relevancy of the number of PGs is also reasonable. If t

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-04-26 Thread Boris Behrens
So, I just checked the logs on one of our smaller cluster and it looks like this error happened twice last week. The cluster contains 12x8TB OSDs without any SSDs as cache. And it started with octopus (so no upgrade from nautilus was performed) root@3cecef08a104:~# zgrep -i marked /var/log/ceph/c

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-04-26 Thread Boris Behrens
So, I just checked the logs on one of our smaller cluster and it looks like this error happened twice last week. The cluster contains 12x8TB OSDs without any SSDs as cache. And it started with octopus (so no upgrade from nautilus was performed) root@3cecef08a104:~# zgrep -i marked /var/log/ceph/c

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-04-26 Thread Konstantin Shalygin
Hi, After some load HDD's will be not perform well. You should move block.db's to NVMe for avoid database vacuuming problems k Sent from my iPhone > On 26 Apr 2022, at 13:58, Boris Behrens wrote: > > The cluster contains 12x8TB OSDs without any SSDs as cache _

[ceph-users] Re: zap an osd and it appears again

2022-04-26 Thread Adam King
Hi Luis, Was the osd spec responsible for creating this osd set to unmanaged? Having it re-pickup available disks is the expected behavior right now (see https://docs.ceph.com/en/latest/cephadm/services/osd/#declarative-state) although we've been considering changing this as it seems like in the m

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-04-26 Thread Boris
I have this problem also with OSDs that are with SSDs as block.db. > > Am 26.04.2022 um 17:10 schrieb Konstantin Shalygin : > > Hi, > > After some load HDD's will be not perform well. You should move block.db's to > NVMe for avoid database vacuuming problems > > k > Sent from my iPhone > >

[ceph-users] Re: OSD crash with end_of_buffer + bad crc

2022-04-26 Thread Gilles Mocellin
Le lundi 11 avril 2022, 10:26:31 CEST Gilles Mocellin a écrit : > Just a follow-up. > > I've found that a specific network interface is causing this. > We have bonds : > - 1 management bond0 > - 1 storage access on bond1 > - 1 storage replication on bond2 > > As the crc errors are all between cl

[ceph-users] Re: Bad CRC in data messages logging out to syslog

2022-04-26 Thread Gilles Mocellin
Le lundi 25 avril 2022, 17:46:04 CEST Chris Page a ?crit : > Hi, > > Every now and then I am getting the following logs - > > pve01 2022-04-25T16:41:03.109+0100 7ff35b6da700 0 bad crc in data > 3860390385 != exp 919468086 from v1:10.0.0.111:0/873787122 > pve01 2022-04-25T16:41:04.361+0100 7fb0e2

[ceph-users] Re: zap an osd and it appears again

2022-04-26 Thread Anthony D'Atri
> Was the osd spec responsible for creating this osd set to unmanaged? Having > it re-pickup available disks is the expected behavior right now (see > https://docs.ceph.com/en/latest/cephadm/services/osd/#declarative-state) > although we've been considering changing this as it seems like in the

[ceph-users] Re: zap an osd and it appears again

2022-04-26 Thread David Rivera
Hi, We currently remove drives without --zap if we do not want them to be automatically re-added. After full removal from the cluster or on addition of new drives we set `ceph orch pause` to do be able to work on the drives without ceph interfering. To add the drives we resume the background orche

[ceph-users] Re: Ceph OSD purge doesn't work while rebalancing

2022-04-26 Thread Richard Bade
I agree that it would be better if it was less sensitive to unrelated backfill. I've noticed this recently too, especially if you're purging multiple osds (like a whole host). The first one succeeds but the next one fails even though I have no rebalance set and the osd was already out. I guess if m

[ceph-users] Recommendations on books

2022-04-26 Thread Angelo Höngens
Hey guys and girls, Can you recommend some books to get started with ceph? I know the docs are probably a good source, but books, in my experience, do a better job of glueing it all together and painting the big picture. And I can take a book to places where reading docs on a laptop is inconvenien

[ceph-users] Re: cephfs hangs on writes

2022-04-26 Thread Xiubo Li
On 4/27/22 3:40 AM, Vladimir Brik wrote: I am attaching MDS log with debug set to 25 of a time period (few seconds' worth) during which a dd command got stuck (it never got unstuck) and resulted in an empty file. I am guessing it was able to create the file but was blocked from writing to it.

[ceph-users] Re: OSD crash with end_of_buffer + bad crc

2022-04-26 Thread Konstantin Shalygin
Just for memo record, what is your network card and driver? ethtool -i eth0 Thanks, k Sent from my iPhone > On 26 Apr 2022, at 20:02, Gilles Mocellin > wrote: > > Just to end that thread : > I have changed the network card, and no more errors in Ceph logs since days. > > It's really bad if