cephadm approved. rados/cephadm failures are mostly caused by
https://github.com/ceph/ceph/pull/49285 not being merged (which just
touches tests and docs so wouldn't block a release).
Thanks
- Adam King
On Thu, Dec 15, 2022 at 12:15 PM Yuri Weinstein wrote:
> Details of this rele
reaker?
> --
> *From:* Adam King
> *Sent:* Friday, December 2, 2022 2:48:19 PM
> *To:* Sake Paulusma
> *Cc:* ceph-users@ceph.io
> *Subject:* Re: [ceph-users] How to replace or add a monitor in stretch
> cluster?
>
> This can't be done in a
This can't be done in a very nice way currently. There's actually an open
PR against main to allow setting the crush location for mons in the service
spec specifically because others found that this was annoying as well. What
I think should work as a workaround is, go to the host where the mon
I typically don't see this when I do OSD replacement. If you do a mgr
failover ("ceph mgr fail") and wait a few minutes does this still show up?
The stray daemon/host warning is roughly equivalent to comparing the
daemons in `ceph node ls` and `ceph orch ps` and seeing if there's anything
in the
o
do before.
On Sat, Nov 19, 2022 at 8:05 AM Adam King wrote:
> I don't know for sure if it will fix the issue, but the migrations happen
> based on a config option "mgr/cephadm/migration_current". You could try
> setting that back to 0 and it would at least trigger the migrat
I don't know for sure if it will fix the issue, but the migrations happen
based on a config option "mgr/cephadm/migration_current". You could try
setting that back to 0 and it would at least trigger the migrations to
happen again after restarting/failing over the mgr. They're meant to be
We had actually considered adding an `extra_daemon_args` to be the
equivalent to `extra_container_args` but for the daemon itself rather than
a flag for the podman/docker run command. IIRC we thought it was a good
idea but nobody actually pushed to add it in then since (at the time) we
weren't
If you're using a fairly recent cephadm version, there is the ability to
provide miscellaneous container arguments in the service spec
https://docs.ceph.com/en/quincy/cephadm/services/#extra-container-arguments.
This means you can have cephadm deploy each container in that service with,
for
Do the journal logs for the OSDs say anything about why they couldn't start
up? ("cephadm ls --no-detail" run on the host will give the systemd units
for each daemon on the host so you can get them easier).
On Mon, Oct 17, 2022 at 1:37 PM Brent Kennedy wrote:
> Below is what the ceph mgr log is
For the weird image, perhaps just "ceph orch daemon redeploy
rgw.testrgw.svtcephrgwv1.invwmo --image quay.io/ceph/ceph:v16.2.10" will
resolve it. Not sure about the other things wrong with it yet but I think
the image should be fixed before looking into that.
On Fri, Oct 14, 2022 at 5:47 AM
Budget Discussion
- Going to investigate current resources being used, see if any costs
can be cut
- What can be moved from virtual environments to internal ones?
- Need to take inventory of what resources we currently have and what
their costs are
17.2.4
- Gibba and LRC
orch suite failures fall under
https://tracker.ceph.com/issues/49287
https://tracker.ceph.com/issues/57290
https://tracker.ceph.com/issues/57268
https://tracker.ceph.com/issues/52321
For rados/cephadm the failures are both
https://tracker.ceph.com/issues/57290
Overall, nothing new/unexpected.
eph2.huidoh (mgr.344392) 211206 :
>> cephadm [DBG] 0 OSDs are scheduled for removal: []
>> 2022-09-02T18:38:21.762480+ mgr.ceph2.huidoh (mgr.344392) 211207 :
>> cephadm [DBG] Saving [] to store
>>
>> On Fri, Sep 2, 2022 at 12:17 PM Adam King wrote:
>>
>&
etheus/prometheus v2.18.1 de242295e225 2 years ago
> 140MB
> quay.io/prometheus/alertmanagerv0.20.0 0881eb8f169f 2 years ago
> 52.1MB
> quay.io/prometheus/node-exporter v0.18.1 e5a616e4b9cf 3 years ago
> 22.9MB
>
>
> On Fri, Sep 2, 2022 at 11:06 AM Adam
2b
>>>>> grafana.ceph1
>>>>> ceph1 running (9h) 64s ago2w 6.7.4
>>>>> quay.io/ceph/ceph-grafana:6.7.4
>>>>> 557c83e11646 7583a8dc4c61
>>>>> mgr.ceph1.smfvfd
>>>>>cep
8-a539d87379ea/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>
> But still getting the same error, do i need to do anything else?
>
> On Fri, Sep 2, 2022 at 9:51 AM Adam King wrote:
>
>> Okay, I'm wondering if this is an issue with version misma
rk
>
> root@ceph1:~# ceph orch rm cephadm
> Failed to remove service. was not found.
>
> root@ceph1:~# ceph orch rm
> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
> Failed to remove service.
>
> was not found.
>
> On Fri, Sep 2, 202
this looks like an old traceback you would get if you ended up with a
service type that shouldn't be there somehow. The things I'd probably check
are that "cephadm ls" on either host definitely doesn't report and strange
things that aren't actually daemons in your cluster such as
"cephadm.".
0.73.0.191 ceph1.example.com ceph1
> 10.73.0.192 ceph2.example.com ceph1
>
> On Thu, Sep 1, 2022 at 8:06 PM Adam King wrote:
>
>> the naming for daemons is a bit different for each daemon type, but for
>> mgr daemons it's always "mgr..". The daemons
>> cephad
//achchusnulchikam.medium.com/deploy-ceph-cluster-with-cephadm-on-centos-8-257b300e7b42
>
> On Thu, Sep 1, 2022 at 6:20 PM Satish Patel wrote:
>
>> Hi Adam,
>>
>> Getting the following error, not sure why it's not able to find it.
>>
>> root@ceph1:~# ceph orc
> ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@mgr.ceph1.xmbvsb.service" and
> "journalctl -xe" for details.
> Traceback (most recent call last):
> File "/usr/sbin/cephadm", line 6250, in
> r = args.func()
> File "/usr/sbin/cephadm", line 1357, i
mage_name": "quay.io/ceph/ceph:v15",
> "container_image_id": null,
> "version": null,
> "started": null,
> "created": "2022-08-19T03:36:22.815608Z",
> "deployed": &qu
Does "ceph orch upgrade status" give any insights (e.g. an error message of
some kind)? If not, maybe you could try looking at
https://tracker.ceph.com/issues/56485#note-2 because it seems like a
similar issue and I see you're using --ceph-version (which we need to fix,
sorry about that).
On Wed,
Are there any extra directories in /var/lib/ceph or /var/lib/ceph/
that appear to be for those OSDs on that host? When cephadm builds the info
it uses for "ceph orch ps" it's actually scraping those directories. The
output of "cephadm ls" on the host with the duplicates could also
potentially have
You were correct about the difference between the distros. Was able to
reproduce fine on ubuntu 20.04 (was using centos 8.stream before). I
opened a tracker as well https://tracker.ceph.com/issues/57293
On Thu, Aug 25, 2022 at 7:44 AM Robert Sander
wrote:
> Am 25.08.22 um 13:41 schrieb A
FWIW, cephadm only writes that file out if it doesn't exist entirely. You
might be able to just remove anything actional functional from it and just
leave a sort of dummy file with only a comment there as a workaround. Also,
was this an upgraded cluster? I tried quickly bootstrapping a
cephadm
running:
> 3)
>
> I'm running ceph -W cephadm with log_to_cluster_level set to debug, but
> except for the walls of text with the inventories, nothing (except
> _kick_service_loop) shows up in the logs after the INF level messages that
> host has been added or service specification has been
If you try shuffling some daemon around on some of the working hosts (e.g.
changing the placement of the node-exporter spec so that one of the working
hosts is excluded so the node-exporter there should be removed) is
cephadm able to actually complete that? Also, does device info for any or
all of
eyring and ceph.conf files.
>
> Should I open a bug somewhere?
>
> On Tue, 2022-08-02 at 08:39 -0400, Adam King wrote:
> > It's possible there's a bug in cephadm around placements where hosts
> have the _no_schedule label.
> > There was https://tracker.ceph.com/issues/56972
It's possible there's a bug in cephadm around placements where hosts have
the _no_schedule label. There was
https://tracker.ceph.com/issues/56972 recently
for an issue with how _no_schedule interacts with placements using explicit
hostnames. It might be something similar here where it thinks
Cephadm has a config option to say whether to use the repo digest or the
tag name. If you want it to use tags "ceph config set mgr
mgr/cephadm/use_repo_digest false" should make that happen (it defaults to
true/using the digest). Beyond that, it's possible you may need to change
the config option
the dashboard the daemon
> shows as errored but it's running (confirmed via podman and systemctl).
> My take is that something is not communicating some information with
> "cephadm" but I don't know
> what. ceph itself knows the mgr is running since it clearly says it's on
>
t;
> because of this I can't run a "ceph orch upgrade" because it always
> complains about having only one.
> Is there something else that needs to be changed to get the cluster to a
> normal state?
>
> Thanks!
>
> On Wed, 2022-07-27 at 12:23 -0400, Adam King wrote:
mon with the new image? at least
> this is something I did in our testing here[1].
>
> ceph orch daemon redeploy mgr.
> quay.ceph.io/ceph-ci/ceph:f516549e3e4815795ff0343ab71b3ebf567e5531
>
> [1] https://github.com/ceph/ceph/pull/47270#issuecomment-1196062363
>
> On Wed, Jul 27
the unit.image file is just there for cpehadm to look at as part of
gathering metadata I think. What you'd want to edit is the unit.run file
(in the same directory as the unit.image). It should have a really long
line specifying a podman/docker run command and somewhere in there will be
Usually it's pretty explicit in "ceph health detail". What does it say
there?
On Mon, Jul 25, 2022 at 9:05 PM Jeremy Hansen
wrote:
> How do I track down what is the stray daemon?
>
> Thanks
> -jeremy
> ___
> ceph-users mailing list --
Do the journal logs for any of the OSDs that are marked down give any
useful info on why they're failing to start back up? If the host level ip
issues have gone away I think that would be the next place to check.
On Mon, Jul 25, 2022 at 5:03 PM Jeremy Hansen
wrote:
> I noticed this on the
orch approved. The test_cephadm_repos test failure is just a problem with
the test I believe, not any actual ceph code. The other selinux denial I
don't think is new.
Thanks,
- Adam King
On Sun, Jul 24, 2022 at 11:33 AM Yuri Weinstein wrote:
> Still seeking approvals for:
>
> rados
re running the maintenance enter command is necessary.
Regards,
- Adam King
On Wed, Jul 13, 2022 at 11:02 AM Steven Goodliff <
steven.goodl...@globalrelay.net> wrote:
>
> Hi,
>
>
> I'm trying to reboot a ceph cluster one instance at a time by running in a
> Ansible playbook which
This sounds similar to something I saw once with an upgrade from 17.2.0 to
17.2.1 (that I failed to reproduce). In that case, what fixed it was
stopping the upgrade, manually redeploying both mgr daemons with the new
version ("ceph orch daemon redeploy --image
", wait a few minutes for the
I tried upgrading a small VM cluster from 16.2.9 with some cephadm-exporter
daemons deployed to 17.2.1 and mine were just automatically removed after a
while. Is there still a service listed for them in "ceph orch ls"?
Thanks,
- Adam King
On Tue, Jun 28, 2022 at 2:01 PM Vla
,
- Adam King
On Thu, Jun 23, 2022 at 11:20 AM Robert Reihs
wrote:
> Hi all,
> I am currently trying to setup a test cluster with cephadm on a system with
> ipv6 setup.
> In the ceph.conf I have:
> ms_bind_ipv4 = false
> ms_bind_ipv6 = true
> I also have a non
and maybe let us see why it's failing, or, if there is no longer an issue
connecting to the host, should mark the host online again.
Thanks,
- Adam King
On Thu, Jun 23, 2022 at 12:30 PM Thomas Roth wrote:
> Hi all,
>
> found this bug https://tracker.ceph.com/issues/51629 (O
fig issue itself was tracked
in https://tracker.ceph.com/issues/54571 and should be resolved as of
16.2.9 and 17.2.1. so hopefully removing these legacy daemon dirs won't be
necessary in the future.
Thanks,
- Adam King
On Thu, Jun 23, 2022 at 6:42 AM Kuhring, Mathias <
mathias.kuhr...@bih-chari
orch approved. Talked with Blaine from the rook team about those rook test
failures. It looks like those tests are outdated and need to be fixed up
eventually but 17.2.1 itself seems to generally work with rook from some
testing with https://github.com/rook/rook/pull/10449.
Thanks,
- Adam
On
ey
operate. A bit more on what I'm saying here if you're interested
https://docs.ceph.com/en/quincy/cephadm/services/#algorithm-description.
Thanks,
- Adam King
On Tue, Jun 7, 2022 at 9:34 AM Patrick Vranckx
wrote:
> Hi,
>
> When you change the configuration of your cluster whi
do this.
>
> Would it be advisable to put some maintenance flags like noout,
> nobackfill, norebalance?
> And maybe stop the ceph target on the host I'm re-adding to pause all
> daemons?
>
> Best, Mathias
> On 5/19/2022 8:14 PM, Adam King wrote:
>
> cephadm just takes the h
cephadm's control while there are still cephadm deployed daemons on it like
that but this is a special case. Anyway, removing and re-adding the host is
the only (reasonable) way to change what it has stored for the hostname
that I can remember.
Let me know if that doesn't work,
- Adam King
in the
majority of cases users want to only pick up the disks available at apply
time and not every matching disk forever. But if you have set the service
to unmanaged and it's still picking up the disks that's a whole different
issue entirely.
Thanks,
- Adam King
On Tue, Apr 26, 2022 at 8:16 AM Luis
Did the 16.2.7 cluster have a non-root ssh user set and a host with an
_admin label? If so, could you try removing the _admin label from the host
and retrying the upgrade? It sounds like
https://tracker.ceph.com/issues/54620.
Thanks,
- Adam King
On Fri, Apr 22, 2022 at 7:25 AM Luis Domingues
Wanted to add on that it looks like from the ceph versions output there is
only 1 mgr daemon. The cephadm upgrade requires there to be at least 2 so
you will need to add another mgr daemon first.
On Thu, Apr 21, 2022 at 10:43 AM Ml Ml wrote:
> Hello,
>
> i am running a 7 Node Cluster with 56
If the cluster is managed by cephadm you should be able to just do a "ceph
orch upgrade start --image quay.io/ceph/ceph:v16.2.7". We test upgrades
from 15.2.0 to pacific and quincy so I think going from 15.2.5 to 16.2.7
directly should work.
___
branch seems to be
working okay and we should be ready to make a new final build based on that.
Thanks,
- Adam King
On Mon, Apr 18, 2022 at 9:36 AM Ilya Dryomov wrote:
> On Fri, Apr 15, 2022 at 3:10 AM David Galloway
> wrote:
> >
> > For transparency and posterity's sake.
you check the contents of etc/ceph? I'm
wondering if there is a bug with how cephadm is setting up the
cephfs-mirror daemon where it's not providing the correct name for the
keyring within the container.
Thanks,
- Adam King
On Tue, Mar 29, 2022 at 8:41 AM Robert Sander
wrote:
> Hi,
>
> we star
s after having all 6 of the
relevant config options set properly. I'll also note that I have been using
podman. Not sure if there is some major logging difference between podman
and docker.
Thanks,
- Adam King
On Thu, Mar 24, 2022 at 1:00 PM Tony Liu wrote:
> Any comments on t
Hi Tony,
Afaik those container flags just set the defaults and the config options
override them. Setting the necessary flags (
https://docs.ceph.com/en/latest/cephadm/operations/#logging-to-files)
seemed to work for me.
[ceph: root@vm-00 /]# ceph config get osd.0 log_to_file
false
[ceph:
I don't know for sure, but it's possibly a result of the centos 8 EOL stuff
from a few weeks ago (they removed seom repos and a lot of our build stuff
broke). I think we had to update some of our container images to deal with
that.
- Adam King
On Fri, Feb 25, 2022 at 10:55 AM Robert Sander
aemons yet, maybe setting
the global container image to an image that exists "ceph config set global
container_image " then adding the host I think should allow you
to place daemons on the host as normal. Again, once things are healthy, you
can use upgrade to make sure every daemon is on the
on each of these hosts to satisfy both rgw services specified but if
they both try to use the same port whichever one gets placed second could
go into error state for that reason.
- Adam King
On Fri, Feb 18, 2022 at 1:38 PM Ron Gage wrote:
> All:
>
> I think I found the proble
Is there anything useful in the rgw daemon's logs? (e.g. journalctl -xeu
ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk)
- Adam King
On Wed, Feb 16, 2022 at 3:58 PM Ron Gage wrote:
> Hi everyone!
>
>
>
> Looks like I am having some problems with some of my c
There was actually a change made to allow upgrading osds in a more parallel
fashion nearly a year ago (https://github.com/ceph/ceph/pull/39726) that
made its way into pacific but not octopus which could explain the
discrepancy here. I guess we need a flag to have the upgrade not do this
for users
Hi Arun,
Not too sure about the port thing (I'll look into that when I have a
chance) but it does look like a bug with bootstrapping with the
'--no-minimize-config' flag. I opened a tracker issue for it
https://tracker.ceph.com/issues/54141.
Thanks for helping find this bug,
- Adam King
On Fri
). The recommended way
for handling isolated environments is to push containers to a local
registry on one of the hosts then set the local image as the image to use
during bootstrap (or afterwards with upgrade). See
https://docs.ceph.com/en/latest/cephadm/install/#deployment-in-an-isolated-environment
- Adam King
o it might be difficult for me to recreate but I'll see what I
can do. In the meantime, hopefully using the upgrade for a workaround is at
least okay for you.
- Adam King
On Thu, Feb 3, 2022 at 2:32 PM Arun Vinod wrote:
> Hi Adam,
>
> Thanks for the update. In that case this looks like a bug l
mgr module) so I'd need to know what's causing it to come about to come
up with a good fix.
- Adam King
On Thu, Feb 3, 2022 at 11:29 AM Arun Vinod wrote:
> Hi Adam,
>
> Thanks for reviewing the long output.
>
> Like you said, it makes total sense now since the first mon and mgr are
>
the
health warning may have got it in line with the others. Not really sure on
that front.
- Adam King
On Thu, Feb 3, 2022 at 2:24 AM Arun Vinod wrote:
> Hi Adam,
>
> Big Thanks for the responses and clarifying the global usage of the
> --image parameter. Eventhough, I gave --image dur
Glad it's working. Honestly, no idea how that happened, never seen it
before. Let me know if you ever find out what command caused it.
- Adam King
On Tue, Feb 1, 2022 at 11:29 AM Fyodor Ustinov wrote:
> Hi!
>
> Adam! Big thanx!
>
> "ceph config rm osd.91 container_im
t things in the
logs).
Sorry for not being too helpful,
- Adam King
On Tue, Feb 1, 2022 at 3:27 AM Fyodor Ustinov wrote:
> Hi!
>
> No mode ideas? :(
>
>
> - Original Message -
> > From: "Fyodor Ustinov"
> > To: "Adam King"
> > Cc:
progress. That should get all the ceph daemons on
whatever image it is you specify in the upgrade start command and cause
future ceph daemons to be deployed with that image as well.
- Adam King
On Mon, Jan 31, 2022 at 10:08 AM Arun Vinod
wrote:
> Hi All,
>
> How can change the de
uite
wrong. Could you try running "ceph mgr fail" and if nothing seems to be
resolved could you post "ceph log last 200 debug cephadm". Maybe we can see
if something gets stuck again after the mgr restarts.
Thanks,
- Adam King
On Thu, Jan 27, 2022 at 7:06 PM Fyodor Usti
Hello Vlad,
Just some insight into how CEPHADM_STRAY_DAEMON works: This health warning
is specifically designed to point out daemons in the cluster that cephadm
is not aware of/in control of. It does this by comparing the daemons it has
cached info on (this cached info is what you see in "ceph
container images (format is "ceph
config get mgr mgr/cephadm/container_image_" where daemon type
is one of "prometheus", "node_exporter", "alertmanager", "grafana",
"haproxy", "keepalived").
Thanks,
- Adam King
On Thu, Jan 2
ter a few minutes….notice one daemon on cephmon03 & the other osd30.
> This seems random
>
> [ceph: root@osd16 /]# ceph orch ps | grep error
>
> mds.cephmon03.local osd16.local osd17.local osd18.local.cephmon03.pwtvcw
> cephmon03.local error 53s ago12m
> docker.i
Hello Michael,
If you're trying to remove all the mds daemons in this mds "cephmon03.local
osd16.local osd17.local osd18.local" I think the command would be "ceph
orch rm "mds.cephmon03.local osd16.local osd17.local osd18.local"" (note
the quotes around that mds.cepmon . . . since cephadm thinks
Hi Roman, what ceph version are you on? Also, when you ran the
restart command originally, did you get a message about scheduling the
restarts or no output?
On Tue, Nov 23, 2021 at 6:04 AM Roman Steinhart wrote:
> Hi all,
>
> while digging down another issue I had with the managers I
Hello Carsten, as an fyi, there is actually a bootstrap flag specifically
for clusters intended to be one node called "--single-host-defaults" (which
would make bootstrap command "cephadm bootstrap --mon-ip
--single-host-defaults") if you want some better settings for single node
clusters. As for
Hi Denis,
Which ceph version is your cluster running on? I know there was an issue
with mons getting dropped from the monmap (and therefore being stuck out of
quorum) when their host was rebooted in Pacific version prior to 16.2.6
https://tracker.ceph.com/issues/51027. If you're on a Pacific
return future.result()
>> File "/usr/sbin/cephadm", line 1433, in run_with_timeout
>> stdout, stderr = await asyncio.gather(tee(process.stdout),
>> File "/usr/sbin/cephadm", line 1415, in tee
>> async for line in reader:
>> File "/us
It looks like the output from a ceph-volume command was too long to handle.
If you run "cephadm ceph-volume -- inventory --format=json" (add
"--with-lsm" if you've turned on device_enhanced_scan) manually on each
host do any of them fail in a similar fashion?
On Fri, Sep 24, 2021 at 1:37 PM Marco
Does running "ceph mgr fail" then waiting a bit make the "ceph orch"
commands responsive? That's worked for me sometimes before when they
wouldn't respond.
On Thu, Sep 16, 2021 at 8:08 AM Javier Cacheiro
wrote:
> Hi,
>
> I have configured a ceph cluster with the new Pacific version (16.2.4)
>
Wanted to respond to the original thread I saw archived on this topic but I
wasn't subscribed to the mailing list yet so don't have the thread in my
inbox to reply to. Hopefully, those involved in that thread still see this.
This issue looks the same as https://tracker.ceph.com/issues/51027 which
101 - 181 of 181 matches
Mail list logo