[ceph-users] Corruption on cluster

2021-09-21 Thread David Schulz
Hi Everyone, For a couple of weeks I've been battling a corruption in Ceph FS that happens when a writer on one node writes a line and calls sync as is typical with logging and the file is corrupted when the same file that is being written is read from another client. The cluster is a Nautilus

[ceph-users] Re: *****SPAM***** Re: Corruption on cluster

2021-09-21 Thread David Schulz
#x27;s content here. >> >>> -Original Message- >>> From: Patrick Donnelly >>> Sent: Tuesday, 21 September 2021 19:30 >>> To: David Schulz >>> Cc: ceph-users@ceph.io >>> Subject: *SPAM* [ceph-users] Re: Corruption on cluster &g

[ceph-users] Re: *****SPAM***** Re: Corruption on cluster

2021-09-24 Thread David Schulz
to this page. Maybe others also not, so it is better >> to paste it's content here. >> >>> -----Original Message- >>> From: Patrick Donnelly >>> Sent: Tuesday, 21 September 2021 19:30 >>> To: David Schulz >>> Cc: ceph-users

[ceph-users] Unbalanced Cluster

2022-05-04 Thread David Schulz
Hi Everyone, I'm looking for a bit of guidance on a 9 server * 16 OSDs per server= 144 OSDs system. This cluster has 143 OSDs in it but ceph osd df shows that they are very unbalanced in their utilization.  Some are around 50% full and yet others are pushing 85% full.  The balancer was on and

[ceph-users] Re: Unbalanced Cluster

2022-05-04 Thread David Schulz
Could you also provide the output of "ceph osd df tree"? > > Josh > > On Wed, May 4, 2022 at 1:21 PM David Schulz wrote: >> Hi Everyone, >> >> I'm looking for a bit of guidance on a 9 server * 16 OSDs per server= >> 144 OSDs system. >> &g

[ceph-users] Re: Unbalanced Cluster

2022-05-04 Thread David Schulz
Hi Josh, We do have an old pool that is empty so there's 4611 empty PGs but the rest seem fairly close: # ceph pg ls|awk '{print $7/1024/1024/10}'|cut -d "." -f 1|sed -e 's/$/0/'|sort -n|uniq -c    4611 00   1 1170   8 1180 10 1190 28 1200 51 1210 54 1220    

[ceph-users] Re: Unbalanced Cluster

2022-05-05 Thread David Schulz
Hi Richard, Thanks for that.  It never occurred to me that we'd need at least 10 servers for that shape of EC.  We will certainly push to get that new server in now. -Dave On 2022-05-04 5:07 p.m., Richard Bade wrote: > [△EXTERNAL] > > > > Hi David, > I think that part of the problem with unbal

[ceph-users] Re: Unbalanced Cluster

2022-05-05 Thread David Schulz
after everything settles, if it's still too unbalanced I'd go for the upmap balancer. Needless to say, all these would cause major data migration so it should be planned carefully. Best, On Thu, May 5, 2022 at 12:02 AM David Schulz mailto:dsch...@ucalgary.ca>> wrote: Hi Josh, We d

[ceph-users] Re: Unbalanced Cluster

2022-05-06 Thread David Schulz
WHA? Mind blown. I hadn't noticed that you can reduce PG counts now! Thanks Richard for pointing that out. I've already reduced the pgs in that unused pool to half of what it was but I think the other backfill operations have blocked that but for the moment I think the system is ok at lea

[ceph-users] Re: Unbalanced Cluster

2022-05-06 Thread David Schulz
veral >> modifications in >> weights/reweights but I'm not sure if they're manual or balancer adjusted. >> >> I would first delete that empty pool to have a more clear picture of PGs on >> OSDs. Then I would increase the pg_num for pool 6 to 2048. And after >> ev