Hi Frank,
CC Patrick.
On Tue, Nov 29, 2022 at 8:58 PM Frank Schilder wrote:
>
> Hi Venky,
>
> thanks for taking the time. I'm afraid I still don't get the difference.
> Maybe the ceph dev terminology means something else than what I use. Let's
> look at this statement, I think it summarises
Great, thanks, that seems to be what I needed. The osds are running
again and the cluster is beginning its long road to recovery. It looks
like I'm left with a few unfound objects and 3 osds that won't start due
to crashes while reading the osdmap, but I'll see if I can work through
that.
On
Hi All,
I have a ceph cluster for test, and it has some pool without replica and
EC, so while i run "ceph orch upgrade start --ceph_version 17.2.5" , log
got "Upgrade: unsafe to stop osd(s) at this time (74 PGs are or would
become offline)" ,and waiting it. Is any way to skip this and upgrade
It's also possible you're running into large pglog entries - any
chance you're running RGW and there's an s3:CopyObject workload
hitting an object that was uploaded with MPU?
https://tracker.ceph.com/issues/56707
If that's the case, you can inject a much smaller value for
osd_min_pg_log_entries
Hi, it sounds like you might be affected by the pg_log dup bug:
# Check if any OSDs are affected by the pg dup problem
sudo -i ceph tell "osd.*" perf dump | grep -e pglog -e "osd\\."
If any osd_pglog_items>>1M check
https://www.clyso.com/blog/osds-with-unlimited-ram-growth/
Best regards,
On Tue, Nov 29, 2022 at 1:18 PM Joshua Timmer
wrote:
> I've got a cluster in a precarious state because several nodes have run
> out of memory due to extremely large pg logs on the osds. I came across
> the pglog_hardlimit flag which sounds like the solution to the issue,
> but I'm concerned
I've got a cluster in a precarious state because several nodes have run
out of memory due to extremely large pg logs on the osds. I came across
the pglog_hardlimit flag which sounds like the solution to the issue,
but I'm concerned that enabling it will immediately truncate the pg logs
and
Hi,
I've been testing the cephadm upgrade process in my staging environment
and I'm running into an issue where the docker container just doesn't
boot up anymore. This is an octopus to Nautilus 16.2.10 upgrade and I
expect to upgrade to quincy afterwards. This is also running on Ubuntu
Thanks! Appreciate everyone who responded :)
After reading up on stretch mode, it appears some of the exact things it
was created to prevent happened, so this would be the solution!
Cheers,
D.
-Original Message-
From: Frank Schilder [mailto:fr...@dtu.dk]
Sent: Tuesday, November 29,
Hi Patrick.
> "Both random and distributed ephemeral pin policies are off by default
> in Octopus. The features may be enabled via the
> mds_export_ephemeral_random and mds_export_ephemeral_distributed
> configuration options."
Thanks for that hint! This is a baddie. I never read that far,
Hi Frank,
Sorry for the delay and thanks for sharing the data privately.
On Wed, Nov 23, 2022 at 4:00 AM Frank Schilder wrote:
>
> Hi Patrick and everybody,
>
> I wrote a small script that pins the immediate children of 3 sub-dirs on our
> file system in a round-robin way to our 8 active
Thanks for the suggestions. It took me a little bit to get to try it out, but I
was able to get the cluster upgraded from Octopus to the latest Pacific.
Setting the migration_current value didn't seem to un-wedge anything, but
manually setting the registry_credentials key did.
It appears my
Hi Venky,
thanks for taking the time. I'm afraid I still don't get the difference. Maybe
the ceph dev terminology means something else than what I use. Let's look at
this statement, I think it summarises my misery quite well:
> It's an implementation difference. In octopus, each child dir
Hi Frank,
On Tue, Nov 29, 2022 at 5:38 PM Frank Schilder wrote:
>
> Hi Venky,
>
> maybe you can help me clarifying the situation a bit. I don't understand the
> difference between the two pinning implementations you describe in your reply
> and I also don't see any difference in meaning in the
Hello,
thank you very much for advices.
Now I have two public networks.
I've tried to set cluster to use both public addresses, but I've
not successful.
# ceph config global public_network 192.168.1.0/24,192.168.2.0/24
# ceph config mon public_network 192.168.1.0/24,192.168.2.0/24
-
Hi Venky,
maybe you can help me clarifying the situation a bit. I don't understand the
difference between the two pinning implementations you describe in your reply
and I also don't see any difference in meaning in the documentation between
octopus and quicy, the difference is just in wording.
On Tue, Nov 29, 2022 at 1:42 PM Frank Schilder wrote:
>
> Hi Venky.
>
> > You most likely ran into performance issues with distributed ephemeral
> > pins with octopus. It'd be nice to try out one of the latest releases
> > for this.
>
> I run into the problem that distributed ephemeral pinning
Hi Dale,
> we thought we had set it up to prevent.. and with size = 4 and min_size set =
> 1
I'm afraid this is exactly what you didn't. Firstly, min_size=1 is always a bad
idea. Secondly, if you have 2 data centres, the only way to get this to work is
to use stretch mode. Even if you had
Hi Venky.
> You most likely ran into performance issues with distributed ephemeral
> pins with octopus. It'd be nice to try out one of the latest releases
> for this.
I run into the problem that distributed ephemeral pinning seems not actually
implemented in octopus. This mode didn't pin
19 matches
Mail list logo