ff
Thanks,
Stefan
On 1/1/21, 11:23 PM, "Anthony D'Atri" wrote:
I have to ask if this might be the balancer or the PG autoscaler at work
> On Jan 1, 2021, at 7:15 PM, Stefan Wild wrote:
>
> Our setup is not using SSDs as the Bluestore DB devices. We only have 2
SSDs
e
> and stability wise.
> Or you may be able to workaround the excessive swap usage (when
> bluefs_buffered_io is set to true) by lowering vm.swappiness or
> disabling the swap.
>
> Regards,
>
> Frédéric.
>
> Le 14/12/2020 à 2
Hi Frédéric,
Thanks for the additional input. We are currently only running RGW on the
cluster, so no snapshot removal, but there have been plenty of remappings with
the OSDs failing (all of them at first during and after the OOM incident, then
one-by-one). I haven't had a chance to look into
will look up the referenced thread(s) and try the offline DB compaction. It
would be amazing if that does the trick.
Will keep you posted, here.
Thanks,
Stefan
From: Igor Fedotov
Sent: Monday, December 14, 2020 6:39:28 AM
To: Stefan Wild ; ceph-users@ceph.io
history from 2 weeks ago, but
ballooning and running out of memory is not the issue anymore.
Thanks,
Stefan
From: Kalle Happonen
Sent: Monday, December 14, 2020 5:00:17 AM
To: huxia...@horebdata.cn
Cc: Stefan Wild ; ceph-users
Subject: Re: [ceph-users] Re: OSD
Igor
On 12/13/2020 5:44 AM, Stefan Wild wrote:
> Just had another look at the logs and this is what I did notice after the
affected OSD starts up.
>
> Loads of entries of this sort:
>
> Dec 12 21:38:40 ceph-tpa-server1 bash[780507]: debug
2020-12-13T02:38:40
Got a trace of the osd process, shortly after ceph status -w announced boot for
the osd:
strace: Process 784735 attached
futex(0x5587c3e22fc8, FUTEX_WAIT_PRIVATE, 0, NULL) = ?
+++ exited with 1 +++
It was stuck at that one call for several minutes before exiting.
From: Stefan Wild
Date
er1 systemd[1]: Started Ceph osd.1 for
08fa929a-8e23-11ea-a1a2-ac1f6bf83142.
Hope that helps…
Thanks,
Stefan
From: Stefan Wild
Date: Saturday, December 12, 2020 at 9:35 PM
To: "ceph-users@ceph.io"
Subject: OSD reboot loop after running out of memory
Hi,
We recently upgraded
Hi,
We recently upgraded a cluster from 15.2.1 to 15.2.5. About two days later, one
of the server ran out of memory for unknown reasons (normally the machine uses
about 60 out of 128 GB). Since then, some OSDs on that machine get caught in an
endless restart loop. Logs will just mention system
On 6/12/20, 5:40 AM, "James, GleSYS" wrote:
> When I set the debug_rgw logs to "20/1", the issue disappears immediately,
> and the throughput for the index pool goes back down to normal levels.
I can – somewhat happily – confirm that setting debug_rgw to "20/1" makes the
issue disappear
Hi everyone,
We are currently transitioning from a temporary machine to our production
hardware. Since we're starting with under 200 TB raw storage, we are currently
on only 1–2 physical machines per cluster, eventually in 3 zones. The temporary
machine is undersized for even that with an
11 matches
Mail list logo