[ceph-users] Re: pg repair doesn't start

2022-10-13 Thread Frank Schilder
Hi Eugen, thanks for your answer. I gave a search another try and did indeed find something: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/TN6WJVCHTVJ4YIA4JH2D2WYYZFZRMSXI/ Quote: " ... And I've also observed that the repair req isn't queued up -- if the OSDs are busy with

[ceph-users] Re: pg repair doesn't start

2022-10-13 Thread Eugen Block
Hi, I’m not sure if I remember correctly but I believe the backfill is preventing the repair to happen. I think it has been discussed a couple of times on this list but I don’t know right now if you can tweak anything to prioritize the repair, I believe there is, but not sure. It looks

[ceph-users] pg repair doesn't start

2022-10-13 Thread Frank Schilder
Hi all, we have an inconsistent PG for a couple of days now (octopus latest): # ceph status cluster: id: health: HEALTH_ERR 1 scrub errors Possible data damage: 1 pg inconsistent services: mon: 5 daemons, quorum

[ceph-users] monitoring drives

2022-10-13 Thread Marc
I was wondering what is a best practice for monitoring drives. I am transitioning from sata to sas drives which have less smartctl information not even power on hours. eg. is ceph registering somewhere when an osd has been created? ___ ceph-users

[ceph-users] Re: Cluster crashing when stopping some host

2022-10-13 Thread Murilo Morais
Unfortunately I can't verify if ceph reports any inactive PG. As soon as the second host disconnects practically everything is locked, nothing appears even using "ceph -w". It only appears that the OSDs are offline when dcs2 returns. Note: Apparently there was a new update recently. When I was in

[ceph-users] Re: crush hierarchy backwards and upmaps ...

2022-10-13 Thread Christopher Durham
Dan, Again i am using 16.2.10 on rocky 8 I decided to take a step back and check a variety of options before I do anything. Here are my results. If I use this rule: rule mypoolname { id -5     type erasure     step take myroot     step choose indep 4 type rack    step choose indep 2 type

[ceph-users] Re: Cluster crashing when stopping some host

2022-10-13 Thread Marc
If you do not mind data loss, why do you care about needing to have 2x? Alternative would be to change the replication so it is not over hosts but just on osd's that can reside on one host. > Marc, but there is no mechanism to prevent IO pause? At the moment I > don't worry about data loss. > I

[ceph-users] Re: Cluster crashing when stopping some host

2022-10-13 Thread Eugen Block
Could you share more details? Does ceph report inactive PGs when one node is down? Please share: ceph osd tree ceph osd pool ls detail ceph osd crush rule dump ceph pg ls-by-pool ceph -s Zitat von Murilo Morais : Thanks for answering. Marc, but there is no mechanism to prevent IO pause? At

[ceph-users] Re: Cluster crashing when stopping some host

2022-10-13 Thread Murilo Morais
Thanks for answering. Marc, but there is no mechanism to prevent IO pause? At the moment I don't worry about data loss. I understand that putting it as replica x1 can work, but I need it to be x2. Em qui., 13 de out. de 2022 às 12:26, Marc escreveu: > > > > > I'm having strange behavior on a

[ceph-users] Re: why rgw generates large quantities orphan objects?

2022-10-13 Thread Haas, Josh
Hi Liang, My guess would be this bug: https://tracker.ceph.com/issues/44660 https://www.spinics.net/lists/ceph-users/msg30151.html It's actually existed for at least 6 years: https://tracker.ceph.com/issues/16767 Which occurs any time you reupload the same *part* in a single Multipart Upload

[ceph-users] Re: Cluster crashing when stopping some host

2022-10-13 Thread Marc
> > I'm having strange behavior on a new cluster. Not strange, by design > I have 3 machines, two of them have the disks. We can name them like > this: > dcs1 to dcs3. The dcs1 and dcs2 machines contain the disks. > > I started bootstrapping through dcs1, added the other hosts and left mgr >

[ceph-users] Re: Cluster crashing when stopping some host

2022-10-13 Thread Murilo Morais
I'm using Host as Failure Domain. Em qui., 13 de out. de 2022 às 11:41, Eugen Block escreveu: > What is your failure domain? If it's osd you'd have both PGs on the > same host and then no replica is available. > > Zitat von Murilo Morais : > > > Eugen, thanks for responding. > > > > In the

[ceph-users] Re: Cluster crashing when stopping some host

2022-10-13 Thread Eugen Block
What is your failure domain? If it's osd you'd have both PGs on the same host and then no replica is available. Zitat von Murilo Morais : Eugen, thanks for responding. In the current scenario there is no way to insert disks into dcs3. My pools are size 2, at the moment we can't add more

[ceph-users] Re: Cluster crashing when stopping some host

2022-10-13 Thread Murilo Morais
Eugen, thanks for responding. In the current scenario there is no way to insert disks into dcs3. My pools are size 2, at the moment we can't add more machines with disks, so it was sized in this proportion. Even with min_size=1, if dcs2 stops the IO also stops. Em qui., 13 de out. de 2022 às

[ceph-users] Re: Cluster crashing when stopping some host

2022-10-13 Thread Eugen Block
Hi, if your pools have a size 2 (don't do that except in test environments) and host is your failure domain then all IO is paused if one osd host goes down, depending on your min_size. Can you move some disks to dcs3 so you can have size 3 pools with min_size 2? Zitat von Murilo Morais :

[ceph-users] Cluster crashing when stopping some host

2022-10-13 Thread Murilo Morais
Good morning everyone. I'm having strange behavior on a new cluster. I have 3 machines, two of them have the disks. We can name them like this: dcs1 to dcs3. The dcs1 and dcs2 machines contain the disks. I started bootstrapping through dcs1, added the other hosts and left mgr on dcs3 only.

[ceph-users] Re: MDS Performance and PG/PGP value

2022-10-13 Thread Frank Schilder
Hi Yoann, I'm not using pacific yet, but this here looks very strange to me: cephfs_data data 243T 19.7T usage: 245 TiB used, 89 TiB / 334 TiB avail I'm not sure if there is a mix of raw vs. stored here. Assuming the cephfs_data allocation is right, I'm wondering what your

[ceph-users] Re: MDS Performance and PG/PGP value

2022-10-13 Thread Stefan Kooman
On 10/13/22 13:47, Yoann Moulin wrote: Also, you mentioned you're using 7 active MDS. How's that working out for you? Do you use pinning? I don't really know how to do that, I have 55 worker nodes in my K8s cluster, each one can run pods that have access to a cephfs pvc. we have 28 cephfs

[ceph-users] Re: MDS Performance and PG/PGP value

2022-10-13 Thread Yoann Moulin
Hello Patrick, Unfortunately, increasing the number of PG did not help a lot in the end, my cluster is still in trouble... Here the current state of my cluster : https://pastebin.com/Avw5ybgd Is 256 good value in our case ? We have 80TB of data with more than 300M files. You want at least

[ceph-users] Re: rgw multisite octopus - bucket can not be resharded after cancelling prior reshard process

2022-10-13 Thread Boris
Hi Christian, resharding is not an issue, because we only sync the metadata. Like aws s3. But this looks very broken to me, does anyone got an idea how to fix that? > Am 13.10.2022 um 11:58 schrieb Christian Rohmann > : > > Hey Boris, > >> On 07/10/2022 11:30, Boris Behrens wrote: >> I just

[ceph-users] Re: rgw multisite octopus - bucket can not be resharded after cancelling prior reshard process

2022-10-13 Thread Christian Rohmann
Hey Boris, On 07/10/2022 11:30, Boris Behrens wrote: I just wanted to reshard a bucket but mistyped the amount of shards. In a reflex I hit ctrl-c and waited. It looked like the resharding did not finish so I canceled it, and now the bucket is in this state. How can I fix it. It does not show

[ceph-users] Re: Understanding the total space in CephFS

2022-10-13 Thread Nicola Mori
Hi Stefan, the cluster is built of several old machines, with different numbers of disks (from 8 to 16) and disk sizes (from 500 GB to 4 TB). After the PG increase it is still recovering: the number of PGP is at 213 and has to grow up to 256. The balancer status gives: { "active": true,

[ceph-users] CephFS constant high write I/O to the metadata pool

2022-10-13 Thread Olli Rajala
Hi, I'm seeing constant 25-50MB/s writes to the metadata pool even when all clients and the cluster is idling and in clean state. This surely can't be normal? There's no apparent issues with the performance of the cluster but this write rate seems excessive and I don't know where to look for the

[ceph-users] Re: Understanding the total space in CephFS

2022-10-13 Thread Stefan Kooman
On 10/13/22 09:32, Nicola Mori wrote: Dear Ceph users, I'd need some help in understanding the total space in a CephFS. My cluster is currently built of 8 machines, the one with the smallest capacity has 8 TB of total disk space, and the total available raw space is 153 TB. I set up a 3x

[ceph-users] Re: How to remove remaining bucket index shard objects

2022-10-13 Thread 伊藤 祐司
Hi, Unfortunately, the "large omap objects" message recurred last weekend. So I ran the script you showed to check the situation. `used_.*` is small, but `omap_.*` is large, which is strange. Do you have any idea what it is? id    used_mbytes  used_objects  omap_used_mbytes    omap_used_keys

[ceph-users] Understanding the total space in CephFS

2022-10-13 Thread Nicola Mori
Dear Ceph users, I'd need some help in understanding the total space in a CephFS. My cluster is currently built of 8 machines, the one with the smallest capacity has 8 TB of total disk space, and the total available raw space is 153 TB. I set up a 3x replicated metadata pool and a 6+2 erasure

[ceph-users] Re: Iinfinite backfill loop + number of pgp groups stuck at wrong value

2022-10-13 Thread Nicola Mori
Thank you Frank for the insight. I'd need to study a bit more the details of all of this, but for sure now I understand it a bit better. Nicola ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] rbd: Snapshot Only Permissions

2022-10-13 Thread Dan Poltawski
Hi All, Is there any way to configure capabilities for a user to allow the client to *only* create/delete snapshots? I can't find anything which suggests this is possible on https://docs.ceph.com/en/latest/rados/operations/user-management/. Context: I'm writing a script to automatically create