[ceph-users] OSDs crash after deleting unfound object in Nautilus 14.2.22

2021-09-09 Thread Ansgar Jazdzewski
Hi Folks, We had to delete some unfound objects in our cache to get our cluster back working! but after an hour we see OSD's crash we found that it is caused by the fact that we deleted the: "hit_set_8.3fc_archive_2021-09-09 08:25:58.520768Z_2021-09-09 08:26:18.907234Z" Object Crash-Log can be

[ceph-users] Re: ceph progress bar stuck and 3rd manager not deploying

2021-09-09 Thread mabi
Thank you Eugen. Indeed the answer went to Spam :( So thanks to David for his workaround, it worked like a charm. Hopefully these patches can make it into the next pacific release. ‐‐‐ Original Message ‐‐‐ On Thursday, September 9th, 2021 at 2:33 PM, Eugen Block wrote: > You must

[ceph-users] Re: Smarter DB disk replacement

2021-09-09 Thread Mark Nelson
I don't think the bigger tier 1 enterprise vendors have really jumped on, but I've been curious to see if anyone would create a dense hotswap m.2 setup (possibly combined with traditional 3.5" HDD bays).  The only vendor I've really seen even attempt something like this is icydock:

[ceph-users] Re: ceph progress bar stuck and 3rd manager not deploying

2021-09-09 Thread David Orman
No problem, and it looks like they will. Glad it worked out for you! David On Thu, Sep 9, 2021 at 9:31 AM mabi wrote: > > Thank you Eugen. Indeed the answer went to Spam :( > > So thanks to David for his workaround, it worked like a charm. Hopefully > these patches can make it into the next

[ceph-users] Re: Smarter DB disk replacement

2021-09-09 Thread David Orman
Exactly, we minimize the blast radius/data destruction by allocating more devices for DB/WAL of smaller size than less of larger size. We encountered this same issue on an earlier iteration of our hardware design. With rotational drives and NVMEs, we are now aiming for a 6:1 ratio based on our

[ceph-users] Re: Smarter DB disk replacement

2021-09-09 Thread Konstantin Shalygin
Ceph guarantee data consistency only when its write by Ceph When NVMe dies - we replace it and fill. Normal for our is a fill osd host for a two weeks k Sent from my iPhone > On 9 Sep 2021, at 17:10, Michal Strnad wrote: > > 2. When DB disk is not completely dead and has only relocated

[ceph-users] Re: usable size for replicated pool with custom rule in pacific dashboard

2021-09-09 Thread Francois Legrand
You are probably right ! But this "verification" seems "stupid" ! I created an additional room (with no osd) and then the doashboard doesn't complain anymore ! Indeed, the rule does what we want because  "step choose firstn 0 type room" will select the different rooms (2 in our case) and for

[ceph-users] Re: Smarter DB disk replacement

2021-09-09 Thread Janne Johansson
Den tors 9 sep. 2021 kl 16:09 skrev Michal Strnad : > When the disk with DB died > it will cause inaccessibility of all depended OSDs (six or eight in our > environment), > How do you do it in your environment? Have two ssds for 8 OSDs, so only half go away when one ssd dies. -- May the most

[ceph-users] Smarter DB disk replacement

2021-09-09 Thread Michal Strnad
Hi all, We are discussing different approaches how to replace a disk with DB (typically SSD or NVMe disk) for BlueStore. When the disk with DB died it will cause inaccessibility of all depended OSDs (six or eight in our environment), so we are looking for a way to minimize data loss or time

[ceph-users] Re: ceph progress bar stuck and 3rd manager not deploying

2021-09-09 Thread Eugen Block
You must have missed the response to your thread, I suppose: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/YA5KLI5MFJRKVQBKUBG7PJG4RFYLBZFA/ Zitat von mabi : Hello, A few days later the ceph status progress bar is still stuck and the third mon is for some unknown reason

[ceph-users] Re: usable size for replicated pool with custom rule in pacific dashboard

2021-09-09 Thread Ernesto Puerta
Hi Francois, I'm not an expert on CRUSH rule internals, but I checked the code and it assumes that the failure domain (first choose/chooseleaf step) there is "room": since there are just 2 rooms vs. 3 replicas, it doesn't allow you to create a pool with a rule that might not optimally work (keep

[ceph-users] Re: Exporting CephFS using Samba preferred method

2021-09-09 Thread Jeff Layton
Actually, no -- vfs_ceph doesn't really perform better.  Due to the fact that samba forks on new connections, each incoming connection gets its own ceph client. If multiple SMB clients are accessing the same files, they tend to compete with one another for caps and that causes performance to

[ceph-users] rbd freezes/timeout

2021-09-09 Thread Leon Ruumpol
Hello, We have a ceph cluster with CephFS and RBD images enabled, from Xen-NG we connect directly to rbd images. Several times a day the VMs suffer from a high load/iowait which makes them temporarily inaccessible (arround 10~30 seconds), in the logs on xen-ng I find this: [Thu Sep 9 02:16:06

[ceph-users] Re: ceph fs re-export with or without NFS async option

2021-09-09 Thread Jeff Layton
On Wed, 2021-09-08 at 16:39 +, Frank Schilder wrote: > Hi all, > > I have a question about a ceph fs re-export via nfsd. For NFS v4 mounts the > exports option sync is now the default instead of async. I just made the > experience that using async gives more than a factor 10 performance >

[ceph-users] usable size for replicated pool with custom rule in pacific dashboard

2021-09-09 Thread Francois Legrand
Hi all, I have a test ceph cluster with 4 osd servers containing each 3 osds. The crushmap uses 2 rooms with 2 servers in each room. We use replica 3 for pools. I have the following custom crushrule to ensure that I have at least one copy of each data in each room. rule

[ceph-users] Re: ceph progress bar stuck and 3rd manager not deploying

2021-09-09 Thread mabi
Hello, A few days later the ceph status progress bar is still stuck and the third mon is for some unknown reason still not deploying itself as can be seen from the "ceph orch ls" output below: ceph orch ls NAME PORTSRUNNING REFRESHED AGE PLACEMENT alertmanager

[ceph-users] Re: [Ceph Upgrade] - Rollback Support during Upgrade failure

2021-09-09 Thread Lokendra Rathour
Hi Matthew, *Thanks for the update.* *For the Part:* [my Query] > *Other Query:* > What if the complete cluster goes down, i.e mon crashes another daemon > crashes, can we try to restore the data in OSDs, maybe by reusing the > OSD's in another or new Ceph Cluster or something to save the data.