[ceph-users] Re: restoring ceph cluster from osds

2023-03-09 Thread Eugen Block
Hi, I still think the best approach would be to rebuild the MON store from the OSDs as described here [2]. Just creating new MONs with the same IDs might not be sufficient because they would miss all the OSD keyrings etc., so you'd still have to do some work to get it up. It might be

[ceph-users] Re: restoring ceph cluster from osds

2023-03-09 Thread Ben
Hi, Yes, the old mon daemons are removed. In the first post mon daemons were started with mon data from scratch. After some code search, I suspect without original mon data I could restore the cluster from all osds. But I may be wrong on this. For now, I think it could be of less configuration if

[ceph-users] Re: Trying to throttle global backfill

2023-03-09 Thread Rice, Christian
I received a few suggestions, and resolved my issue. Anthony D'Atri suggested mclock (newer than my nautilus version), adding "--osd_recovery_max_single_start 1” (didn’t seem to take), “osd_op_queue_cut_off=high” (which I didn’t get to checking), and pgremapper (from github). Pgremapper did

[ceph-users] Re: restoring ceph cluster from osds

2023-03-09 Thread Eugen Block
Hi, I'm not familiar with rook so the steps required may vary. If you try to reuse the old mon stores you'll have the mentioned mismatch between the new daemons and the old monmap (which still contains the old mon daemons). It's not entirely clear what went wrong in the first place and

[ceph-users] Re: restoring ceph cluster from osds

2023-03-09 Thread Ben
Hi Eugen, Thank you for help on this. Forget the log. A little progress, the monitors store were restored. I created a new ceph cluster to use the restored monitors store. But the monitor log complains: debug 2023-03-09T11:00:31.233+ 7fe95234f880 0 starting mon.a rank -1 at public addrs

[ceph-users] Re: rbd on EC pool with fast and extremely slow writes/reads

2023-03-09 Thread Andrej Filipcic
Thanks for the hint, did run some short test, all fine. I am not sure it's a drive issue. Some more digging, the file with bad performance has this segments: [root@afsvos01 vicepa]# hdparm --fibmap $PWD/0 /vicepa/0: filesystem blocksize 4096, begins at LBA 2048; assuming 512 byte sectors.

[ceph-users] libceph: mds1 IP+PORT wrong peer at address

2023-03-09 Thread Frank Schilder
Hi all, we seem to have hit a bug in the ceph fs kernel client and I just want to confirm what action to take. We get the error "wrong peer at address" in dmesg and some jobs on that server seem to get stuck in fs access; log extract below. I found these 2 tracker items

[ceph-users] radosgw - octopus - 500 Bad file descriptor on upload

2023-03-09 Thread Boris Behrens
Hi, we've observed 500er errors on uploading files to a single bucket, but the problem went away after around 2 hours. We've checked and saw the following error message: 2023-03-08T17:55:58.778+ 7f8062f15700 0 WARNING: set_req_state_err err_no=125 resorting to 500 2023-03-08T17:55:58.778+

[ceph-users] Re: LRC k6m3l3, rack outage and availability

2023-03-09 Thread Eugen Block
Hi, I haven't had the chance to play with LRC yet, so I can't really comment on that. But can you share your osd tree as well? I assume you already did, but can you verify that the crush rule works as expected and the chunks are distributed correctly? Regards, Eugen Zitat von

[ceph-users] Re: restoring ceph cluster from osds

2023-03-09 Thread Eugen Block
Hi, there's no attachment to your email, please use something like pastebin to provide OSD logs. Thanks Eugen Zitat von Ben : Hi, I ended up with having whole set of osds to get back original ceph cluster. I figured out to make the cluster running. However, it's status is something as