[ceph-users] osds won't start

2022-02-10 Thread Mazzystr
I applied latest os updates and rebooted my hosts. Now all my osds fail to start. # cat /etc/os-release NAME="openSUSE Tumbleweed" # VERSION="20220207" ID="opensuse-tumbleweed" ID_LIKE="opensuse suse" VERSION_ID="20220207" # uname -a Linux cube 5.16.5-1-default #1 SMP PREEMPT Thu Feb 3 05:26:48

[ceph-users] Ceph User + Dev Monthly February Meetup

2022-02-10 Thread Neha Ojha
Hi everyone, This month's Ceph User + Dev Monthly meetup is on February 17, 15:00-16:00 UTC. Please add topics you'd like to discuss in the agenda here: https://pad.ceph.com/p/ceph-user-dev-monthly-minutes. We are hoping to get more feedback from users on the four major themes of Ceph and ask

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-10 Thread Zach Heise (SSCC)
Yes, these 8 PGs have been in this 'remapped' state for quite awhile. I don't know why CRUSH has not seen fit to designate new OSDs for them so that acting and up match. For the error in question - ceph upgrade is saying that only 1 PG would become offline if

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-10 Thread Gregory Farnum
“Up” is the set of OSDs which are alive from the calculated crush mapping. “Acting” includes those extras which have been added in to bring the PG up to proper size. So the PG does have 3 live OSDs serving it. But perhaps the safety check *is* looking at up instead of acting? That seems like a

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-10 Thread 胡 玮文
I believe this is the reason. I mean number of OSDs in the “up” set should be at least 1 greater than the min_size for the upgrade to proceed. Or once any OSD is stopped, it can drop below min_size, and prevent the pg from becoming active. So just cleanup the misplaced and the upgrade should

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-10 Thread Zach Heise (SSCC)
Hi Weiwen, thanks for replying. All of my replicated pools, including the newest ssdpool I made most recently, have a min_size of 2. My other two EC pools have a min_size of 3. Looking at pg dump output again, it does look like the two EC pools have exactly 4

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-10 Thread 胡 玮文
Hi Zach, How about your min_size setting? Have you checked the number of OSDs in the acting set of every PG is at least 1 greater than the min_size of the corresponding pool? Weiwen Hu > 在 2022年2月10日,05:02,Zach Heise (SSCC) 写道: > > Hello, > > ceph health detail says my 5-node cluster is

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-10 Thread Zach Heise (SSCC)
That's an excellent point! Between my last ceph upgrade and now, I did make a new crush ruleset and a new pool that uses that crush rule. It was just for SSDs, of which I have 5, one per host. All of my other pools are using the default crush rulesets

[ceph-users] Re: osd true blocksize vs bluestore_min_alloc_size

2022-02-10 Thread Scheurer François
Hi Igor Many thanks, it worked! ewceph1-osd001-prod:~ # egrep -a --color=always "min_alloc_size" /var/log/ceph/ceph-osd.0.log | tail -111 2022-02-10 18:12:53.918 7f3a1dd4bd00 10 bluestore(/var/lib/ceph/osd/ceph-0) _open_super_meta min_alloc_size 0x1 2022-02-10 18:12:53.926 7f3a1dd4bd00

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-10 Thread Gregory Farnum
I don’t know how to get better errors out of cephadm, but the only way I can think of for this to happen is if your crush rule is somehow placing multiple replicas of a pg on a single host that cephadm wants to upgrade. So check your rules, your pool sizes, and osd tree? -Greg On Thu, Feb 10,

[ceph-users] Re: osd true blocksize vs bluestore_min_alloc_size

2022-02-10 Thread Igor Fedotov
Hi Fransois, you should set debug_bluestore = 10 instead. And then grep for bluestore or min_alloc_size not bluefs, here is how this is printed:  dout(10) << __func__ << " min_alloc_size 0x" << std::hex << min_alloc_size    << std::dec << " order " << (int)min_alloc_size_order    

[ceph-users] Re: osd true blocksize vs bluestore_min_alloc_size

2022-02-10 Thread Scheurer François
Dear Dan Thank you for your help. After putting debug_osd = 10/5 in ceph.conf under [osd], I still do not get min_alloc_size logged. Probably no logging it on 14.2.5. But this come up: ewceph1-osd001-prod:~ # egrep -a --color=always bluefs /var/log/ceph/ceph-osd.0.log | tail -111

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-10 Thread Zach Heise (SSCC)
It could be an issue with the devicehealthpool as you are correct, it is a single PG - but when the cluster is reporting that everything is healthy, it's difficult where to go from there. What I don't understand is why its refusing to upgrade ANY of the osd daemons;

[ceph-users] Re: ceph_assert(start >= coll_range_start && start < coll_range_end)

2022-02-10 Thread Igor Fedotov
Manuel, sometimes we tag critical backports with EOF releases - just in case . Unfortunately this doesn't mean the relevant release is planned to happen. I'm afraid that's the case here as well. AFAIK there is no plans to have another Nautilus release. I can probably backport the patch and

[ceph-users] Re: osd true blocksize vs bluestore_min_alloc_size

2022-02-10 Thread Dan van der Ster
Hi, When an osd starts it should log at level 1 the min_alloc_size, see https://github.com/ceph/ceph/blob/master/src/os/bluestore/BlueStore.cc#L12260 grep "min_alloc_size 0x" ceph-osd.*.log Cheers, Dan On Thu, Feb 10, 2022 at 3:50 PM Scheurer François wrote: > > Hi everyone > > > How can we

[ceph-users] Re: ceph_assert(start >= coll_range_start && start < coll_range_end)

2022-02-10 Thread Manuel Lausch
Hi Igor, yes, it seems to be this issue. I updated the testcluster to 16.2.7. NOw deep-scrubs do not fail anymore. There is a backport ticket for nautilus on your issue. Do you think this will lead to one more nautilus release with this fix? It would be great. Thanks Manuel On Thu, 10 Feb

[ceph-users] Re: ceph_assert(start >= coll_range_start && start < coll_range_end)

2022-02-10 Thread Igor Fedotov
Manuel, it looks like you're facing https://tracker.ceph.com/issues/52705 which was fixed in 16.2.7. Mind upgrade your test cluster and try? Thanks, Igor On 2/10/2022 4:29 PM, Manuel Lausch wrote: I think it was installed with Nautilus and was updated directly to Pacific afterwards. On

[ceph-users] Re: ceph_assert(start >= coll_range_start && start < coll_range_end)

2022-02-10 Thread Manuel Lausch
oh. OSD 87 (one of the replica partners) crashed. Here some lines from the log -10> 2022-02-10T14:28:46.840+0100 7fd1b306d700 5 osd.87 pg_epoch: 2016357 pg[7.3ff( v 2016317'1 (0'0,2016317'1] local-lis/les=2016308/2016309 n=1 ec=2016308/2016308 lis/c=2016308/2016308

[ceph-users] Re: ceph_assert(start >= coll_range_start && start < coll_range_end)

2022-02-10 Thread Manuel Lausch
I think it was installed with Nautilus and was updated directly to Pacific afterwards. On Thu, 10 Feb 2022 16:12:46 +0300 Igor Fedotov wrote: > And one more question, please. Was the test cluster originally deployed > with Pacific or it was some previous Ceph version upgraded to Pacific? >

[ceph-users] Re: ceph_assert(start >= coll_range_start && start < coll_range_end)

2022-02-10 Thread Manuel Lausch
yes the pool on the testcluster contains a lot of objects I created a new pool, put the object (this time only 100K, just to test it) and run a deep-scrub -> error # dd if=/dev/urandom of=test_obj bs=1K count=100 # rados -p nameplosion put

[ceph-users] Re: ceph_assert(start >= coll_range_start && start < coll_range_end)

2022-02-10 Thread Igor Fedotov
And one more question, please. Was the test cluster originally deployed with Pacific or it was some previous Ceph version upgraded to Pacific? Thanks, Igor On 2/10/2022 3:27 PM, Manuel Lausch wrote: Hi Igor, yes I just put an object with "rados put" with the problematic name and 4MB random

[ceph-users] Re: ceph_assert(start >= coll_range_start && start < coll_range_end)

2022-02-10 Thread Igor Fedotov
Speaking of test cluster - there are multiple objects in the test pool, right? If so could you please create a new pool and put just a single object with th problematic name there. Then do the deep scrub. Is the issue reproducible this way? Thanks, Igor On 2/10/2022 3:27 PM, Manuel

[ceph-users] Re: ceph_assert(start >= coll_range_start && start < coll_range_end)

2022-02-10 Thread Manuel Lausch
Hi Igor, yes I just put an object with "rados put" with the problematic name and 4MB random data. The smae size of the object in the production cluster deep-scrub afterwards produces the following error in the osd log 2022-02-09T11:16:42.739+0100 7f0ce58f5700 -1 log_channel(cluster) log [ERR] :

[ceph-users] Re: ceph_assert(start >= coll_range_start && start < coll_range_end)

2022-02-10 Thread Igor Fedotov
Hi Manuel, could you please elaborate a bit about the reproduction steps in 16.2.6: 1) Do you just put the object named this way with rados tool to a replicated pool and subseqent deep scrubs reports the error? Or some othe steps are present? 2) Do you have all-bluestore setup for that

[ceph-users] Re: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools (bug 53663)

2022-02-10 Thread Christian Rohmann
Hey Stefan, thanks for getting back to me! On 10/02/2022 10:05, Stefan Schueffler wrote: since my last mail in Dezember, we changed our ceph-setuo like this: we added one SSD osd on each ceph host (which were pure HDD before). Then, we moved the problematic pool "de-dus5.rgw.buckets.index“

[ceph-users] Re: ceph_assert(start >= coll_range_start && start < coll_range_end)

2022-02-10 Thread Manuel Lausch
Okay. the issue is triggered with a specifc object name -> c76c7ac2014adb9f0f0837ac1e85fd1e241af225908b6a0c3d3a44d6b866e732_0040 And with this name I could trigger at least the scrub issues on a ceph pacific 16.2.6 as well. I opend a bug ticket to this issue: