[ceph-users] Re: Seperate metadata pool in 3x MDS node

2024-02-24 Thread David C.
Hello, Each rack works on different trees or is everything parallelized ? The meta pools would be distributed over racks 1,2,4,5 ? If it is distributed, even if the addressed MDS is on the same switch as the client, you will always have this MDS which will consult/write (nvme) OSDs on the other ra

[ceph-users] Re: Seperate metadata pool in 3x MDS node

2024-02-24 Thread Anthony D'Atri
> > I'm designing a new Ceph storage from scratch and I want to increase CephFS > speed and decrease latency. > Usually I always build (WAL+DB on NVME with Sas-Sata SSD's) Just go with pure-NVMe servers. NVMe SSDs shouldn't cost much if anything more than the few remaining SATA or especially

[ceph-users] Seperate metadata pool in 3x MDS node

2024-02-24 Thread Özkan Göksu
Hello folks! I'm designing a new Ceph storage from scratch and I want to increase CephFS speed and decrease latency. Usually I always build (WAL+DB on NVME with Sas-Sata SSD's) and I deploy MDS and MON's on the same servers. This time a weird idea came to my mind and I think it has great potential

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Matthew Leonard (BLOOMBERG/ 120 PARK)
Once recovery is underway way simply restarting the RGWs should be enough to reset them and get your object store back up. Bloomberg doesn’t use cephfs so hopefully David’s suggestions work or if anyone else in the community can chip in for that part. Sent from Bloomberg Professional for iPh

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread David C.
if rebalancing tasks have been launched it's not a big deal, but I don't think it's the priority. The priority being to get the MDS back on its feet. I haven't seen an answer to this question: can you stop/unmount cephfs clients or not ? There are other solutions but as you are not comfortable I a

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Thank you so much, Matthew. Pls keep an eye on my thread. You and Mr Anthony made my day. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Thank you so much, Sir. You make my day T.T ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
> Low space hindering backfill (add storage if this doesn't resolve > itself): 21 pgs backfill_toofull ^^^ Ceph even told you what you need to do ;) If your have recovery taking place and the numbers of misplaced objects and *full PGs/pools keeps decreasing, then yes wait. As for ge

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Matthew Leonard (BLOOMBERG/ 120 PARK)
Sudo watch ceph -s You should see stats on the recovery and see PGs transition from all the backfill* states to active+clean Once you get everything active clean then we can focus on your rgws and MDSs Sent from Bloomberg Professional for iPhone - Original Message - From: nguyenvand

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Thank you Matthew. Im following guidance from Mr Anthony and now my recovery progress speed is much faster. I will update my case day by day. Thank you so much ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Hi Mr Anthony, Forget it, the osd is UP and recovery speed is x10times Amazing And now we just wait, right ? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Yes, Sir. We added 10TIb to cephosd02 node. Now the disk is IN, but DOWN state. What should we do now :( For additional, the recovery speed is x10 times :) ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Matthew Leonard (BLOOMBERG/ 120 PARK)
Anthony is correct, this is what I was getting at as well when seeing your ceph -s output. More details in the Ceph docs here if you want to understand the details of why you need to balance your nodes. https://docs.ceph.com/en/quincy/rados/operations/monitoring-osd-pg/ But you need to get you

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
Your recovery is stuck because there are no OSDs that have enough space to accept data. Your second OSD host appears to only have 9 OSDs currently, so you should be able to add a 10TB OSD there without removing anything. That will enable data to move to all three of your 10TB OSDs. > On Feb 24

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
HolySh*** First, we change the mon_max_pg_per_osd to 1000 About adding disk for cephosd02, for more detail , what is TO, sir ? I ll make conversation with my boss. To be honest, im thinking that the volume recovery progress will get problem... ___ ce

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
You aren’t going to be able to finish recovery without having somewhere to recover TO. > On Feb 24, 2024, at 10:33 AM, nguyenvand...@baoviet.com.vn wrote: > > Thank you, Sir. But i think i ll wait for PG BACKFILLFULL finish, my boss is > very angry now and will not allow me to add one more disk

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
You also might want to increase mon_max_pg_per_osd since you have a wide spread of OSD sizes. Default is 250. Set it to 1000. > On Feb 24, 2024, at 10:30 AM, Anthony D'Atri wrote: > > Add a 10tb HDD to the third node as I suggested, that will help your cluster. > > >> On Feb 24, 2024, at 10

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Thank you, Sir. But i think i ll wait for PG BACKFILLFULL finish, my boss is very angry now and will not allow me to add one more disk( this action make him think that ceph would take more time for recovering and rebalancing ). We want to wait volume recovering progress finish __

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
Add a 10tb HDD to the third node as I suggested, that will help your cluster. > On Feb 24, 2024, at 10:29 AM, nguyenvand...@baoviet.com.vn wrote: > > I will correct some small things: > > we have 6 nodes, 3 osd node and 3 gaeway node ( which run RGW, mds and nfs > service) > you r corrct, 2/3

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
and sure, we have one more 10tib disk which cephosd02 will get it. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
I will correct some small things: we have 6 nodes, 3 osd node and 3 gaeway node ( which run RGW, mds and nfs service) you r corrct, 2/3 osd node have ONE-NEW 10tib disk About your suggestion, add another osd host, we will. But we need to end this nightmare, my NFS folder which have 10tib data i

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
# ceph osd dump | grep ratio full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.85 Read the four sections here: https://docs.ceph.com/en/quincy/rados/operations/health-checks/#osd-out-of-order-full > On Feb 24, 2024, at 10:12 AM, nguyenvand...@baoviet.com.vn wrote: > > Hi Mr Anthony, Cou

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
There ya go. You have 4 hosts, one of which appears to be down and have a single OSD that is so small as to not be useful. Whatever cephgw03 is, it looks like a mistake. OSDs much smaller than, say, 1TB often aren’t very useful. Your pools appear to be replicated, size=3. So each of your cep

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Hi Mr Anthony, Could you tell me more details about raising the full and backfullfull threshold is it ceph tell 'osd.*' injectargs --osd-max-backfills=2 --osd-recovery-max-active=6 ?? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe sen

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Hi Mr Anthony pls check the output https://anotepad.com/notes/s7nykdmc ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Hi Mr Anthony, pls check the output https://anotepad.com/notes/s7nykdmc ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Hi Mathew, 1) We have 2 MDS service running before this nightmare. Now we trying to apply mds on 3 nodes, but all of them will stop within 2 minutes. 2) You are correct. We just add two 10TIB disk to cluster ( which currently have 27 x 4TIB disk), all of them have weight 1.0 About volume recov

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Hi David, I ll follow your suggestion. Do you have Telegram ? If yes, could you pls add my Telegram, +84989177619. Thank you so much ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Scrubs Randomly Starting/Stopping

2024-02-24 Thread Ashley Merrick
So I have done some further digging. Seems similar to this : Bug #54172: ceph version 16.2.7 PG scrubs not progressing - RADOS - Ceph Apart from: 1/ I have restarted all OSD's/forced a re-peer and the issue is still there 2/ Setting noscrub stops the scrubs "appearing" Checking a PG seems its jus

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
> > 2) It looks like you might have an interesting crush map. Allegedly you have > 41TiB of space but you can’t finish rococering you have lots of PGs stuck as > their destination is too full. Are you running homogenous hardware or do you > have different drive sizes? Are all the weights set c

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Matthew Leonard (BLOOMBERG/ 120 PARK)
It looks like you have quite a few problems I’ll try and address them one by one. 1) Looks like you had a bunch of crashes, from the ceph -s it looks like you don’t have enough MDS daemons running for a quorum. So you’ll need to restart the crashed containers. 2) It looks like you might have

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread David C.
Do you have the possibility to stop/unmount cephfs clients ? If so, do that and restart the MDS. It should restart. Have the clients restart one by one and check that the MDS does not crash (by monitoring the logs) Cordialement, *David C

[ceph-users] Re: Size return by df

2024-02-24 Thread Albert Shih
Le 22/02/2024 à 18:07:51+0300, Konstantin Shalygin a écrit Hi, Thanks. > > Yes you can, this controlled by option > > > client quota df = false But I'm unable to make it work. Is this the correct syntaxe ? [global] fsid = *** mon_host = [v2:10

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Hi Mathew Pls chekc my ceph -s ceph -s cluster: id: 258af72a-cff3-11eb-a261-d4f5ef25154c health: HEALTH_WARN 3 failed cephadm daemon(s) 1 filesystem is degraded insufficient standby MDS daemons available 1 nearfull osd(s)