[ceph-users] Re: CEPHADM_STRAY_DAEMON with iSCSI service

2021-12-08 Thread Paul Giralt (pgiralt)
https://tracker.ceph.com/issues/5 -Paul Sent from my iPhone On Dec 8, 2021, at 8:00 AM, Robert Sander wrote: Hi, i just upgraded to 16.2.7 and deployed an iSCSI service. Now I get for each configured target three stray daemons

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-09-08 Thread Paul Giralt (pgiralt)
, at 7:29 AM, Xiubo Li mailto:xiu...@redhat.com>> wrote: On 9/3/21 11:32 PM, Paul Giralt (pgiralt) wrote: On Sep 3, 2021, at 4:28 AM, Xiubo Li mailto:xiu...@redhat.com>> wrote: And TCMU runner shows 3 hosts up: services: mon: 5 daemons, quorum cxcto-c240-j27-01.ci

[ceph-users] Re: Ceph dashboard pointing to the wrong grafana server address in iframe

2021-09-08 Thread Paul Giralt (pgiralt)
hlight=dashboard#alternative-url-for-browsers [3] https://docs.ceph.com/en/latest/cephadm/monitoring/#networks-and-ports Kind Regards, Ernesto On Wed, Sep 8, 2021 at 6:59 PM Paul Giralt (pgiralt) mailto:pgir...@cisco.com>> wrote: For some reason, the grafana dashboards in the dashb

[ceph-users] Ceph dashboard pointing to the wrong grafana server address in iframe

2021-09-08 Thread Paul Giralt (pgiralt)
For some reason, the grafana dashboards in the dashboard are all pointing to a node that does not and has never run the grafana / Prometheus services. I’m not sure where this value is kept and how to change to back. My two manager nodes are 10.122.242.196 and 10.122.242.198. For some reason,

[ceph-users] Re: Cephadm not properly adding / removing iscsi services anymore

2021-09-08 Thread Paul Giralt (pgiralt)
tely helps you to debug, for example with > 'cephadm enter --name ' you get a shell for that container or > 'cephadm logs --name ' you can inspect specific logs. > > > Zitat von "Paul Giralt (pgiralt)" : > >> Thanks Eugen. >> >> At first I tried cephadm rm-daemon

[ceph-users] Re: Cephadm not properly adding / removing iscsi services anymore

2021-09-08 Thread Paul Giralt (pgiralt)
on about the iscsi deployment. Or run 'cephadm logs --name > '. > > > Zitat von "Paul Giralt (pgiralt)" : > >> This was working until recently and now seems to have stopped working. >> Running Pacific 16.2.5. When I modify the deployment YAML file for my iscsi &

[ceph-users] Cephadm not properly adding / removing iscsi services anymore

2021-09-07 Thread Paul Giralt (pgiralt)
This was working until recently and now seems to have stopped working. Running Pacific 16.2.5. When I modify the deployment YAML file for my iscsi gateways, the services are not being added or removed as requested. It’s as if the state is “stuck”. At one point I had 4 iSCSI gateways: 02, 03,

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-09-03 Thread Paul Giralt (pgiralt)
On Sep 3, 2021, at 4:28 AM, Xiubo Li mailto:xiu...@redhat.com>> wrote: And TCMU runner shows 3 hosts up: services: mon: 5 daemons, quorum cxcto-c240-j27-01.cisco.com,cxcto-c240-j27-06,cxcto-c240-j27-10,cxcto-c240-j27-08,cxcto-c240-j27-12

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-09-01 Thread Paul Giralt (pgiralt)
e it. -Paul On Sep 1, 2021, at 9:17 PM, Xiubo Li mailto:xiu...@redhat.com>> wrote: On 9/1/21 12:32 PM, Paul Giralt (pgiralt) wrote: However, the gwcli command is still showing the other two gateways which are no longer enabled anymore. Where does this list of gateways get stored? All t

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-31 Thread Paul Giralt (pgiralt)
However, the gwcli command is still showing the other two gateways which are no longer enabled anymore. Where does this list of gateways get stored? All this configurations are stored in the "gateway.conf" object in "rbd" pool. How do I access this object? Is it a file or some kind of object

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-31 Thread Paul Giralt (pgiralt)
Thank you. This is exactly what I was looking for. If I’m understanding correctly, what gets listed as the “owner” is what gets advertised via ALUA as the primary path, but the lock owner indicates which gateway currently owns the lock for that image and is allowed to pass traffic for that

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-31 Thread Paul Giralt (pgiralt)
Xiubo, Thank you for all the help so far. I was finally able to figure out what the trigger for the issue was and how to make sure it doesn’t happen - at least not in a steady state. There is still the possibility of running into the bug in a failover scenario of some kind, but at least for

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-30 Thread Paul Giralt (pgiralt)
On Aug 30, 2021, at 7:14 PM, Xiubo Li mailto:xiu...@redhat.com>> wrote: We are using “Most Recently Used” - however there are 50 ESXi hosts all trying to access the same data stores, so it’s very possible that one host is choosing iSCSI gateway 1 and another host is choosing iSCSI gateway

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-30 Thread Paul Giralt (pgiralt)
chance this fix will make it into the 16.2.6 release? Not sure, I am still wait someone to help me review them. Ilya, would you be able to help? - Xiubo -Paul On Aug 29, 2021, at 8:48 PM, Xiubo Li mailto:xiu...@redhat.com>> wrote: On 8/27/21 11:10 PM, Paul Giralt (pgiralt) wro

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-29 Thread Paul Giralt (pgiralt)
On Aug 29, 2021, at 8:48 PM, Xiubo Li mailto:xiu...@redhat.com>> wrote: On 8/27/21 11:10 PM, Paul Giralt (pgiralt) wrote: Ok - thanks Xiubo. Not sure I feel comfortable doing that without breaking something else, so will wait for a new release that incorporates the fix. In the meantime I’m

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-27 Thread Paul Giralt (pgiralt)
other blacklist entry errors and what might be causing them, that would be greatly appreciated as well. -Paul On Aug 26, 2021, at 8:37 PM, Xiubo Li mailto:xiu...@redhat.com>> wrote: On 8/27/21 12:06 AM, Paul Giralt (pgiralt) wrote: This is great. Is there a way to test the fix in my environment

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-26 Thread Paul Giralt (pgiralt)
:34 PM, Paul Giralt (pgiralt) wrote: Thank you for the analysis. Can you think of a workaround for the issue? -Paul Sent from my iPhone On Aug 26, 2021, at 5:17 AM, Xiubo Li <mailto:xiu...@redhat.com> wrote:  Hi Paul, There has one racy case when updating the state to ceph cluster and whil

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-26 Thread Paul Giralt (pgiralt)
, the crash should happen just after the image was closed and the resources were released and then if work queue was trying to update the state to ceph cluster it will trigger use-after-free bug. I will try to fix it. Thanks On 8/26/21 10:40 AM, Paul Giralt (pgiralt) wrote: I will send a unicast

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-25 Thread Paul Giralt (pgiralt)
I will send a unicast email with the link and details. -Paul On Aug 25, 2021, at 10:37 PM, Xiubo Li mailto:xiu...@redhat.com>> wrote: Hi Paul, Please send me the detail versions of the tcmu-runner and ceph-iscsi packages you are using. Thanks On 8/26/21 10:21 AM, Paul Giralt (p

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-25 Thread Paul Giralt (pgiralt)
wrote: On 8/26/21 10:08 AM, Paul Giralt (pgiralt) wrote: Thanks Xiubo. I will try this. How do I set the log level to 4? It's in the /etc/tcmu/tcmu.cfg in the tcmu container. No need to restart the tcmu-runner service, the changes will be loaded by tcmu-runner daemon after the tcmu.cfg closed.

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-25 Thread Paul Giralt (pgiralt)
Thanks Xiubo. I will try this. How do I set the log level to 4? -Paul On Aug 25, 2021, at 9:30 PM, Xiubo Li mailto:xiu...@redhat.com>> wrote: It's buggy, we need one way to export the tcmu-runner log to the host. Could you see any crash coredump from the host ? Without that could you keep

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-25 Thread Paul Giralt (pgiralt)
into why it’s happening? -Paul -Paul On Aug 25, 2021, at 2:44 PM, Paul Giralt (pgiralt) mailto:pgir...@cisco.com>> wrote: Ilya / Xiubo, The problem just re-occurred on one server and I ran the systemctl status command. You can see there are no tcmu-runner processes listed: [root

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-25 Thread Paul Giralt (pgiralt)
Ilya / Xiubo, The problem just re-occurred on one server and I ran the systemctl status command. You can see there are no tcmu-runner processes listed: [root@cxcto-c240-j27-04 ~]# systemctl status ● cxcto-c240-j27-04.cisco.com State: running Jobs: 0 queued Failed: 0 units

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-25 Thread Paul Giralt (pgiralt)
> > Does the node hang while shutting down or does it lock up so that you > can't even issue the reboot command? > It hangs when shutting down. I can SSH in and issue commands just fine and it takes the shutdown command and kicks me out, but it appears to never shut down as I can still ping

[ceph-users] tcmu-runner crashing on 16.2.5

2021-08-24 Thread Paul Giralt (pgiralt)
I upgraded to Pacific 16.2.5 about a month ago and everything was working fine. Suddenly for the past few days I’ve started having the tcmu-runner container on my iSCSI gateways just disappear. I’m assuming this is because they have crashed. I deployed the services using cephadm / ceph orch in

[ceph-users] Re: Ceph status shows 'updating'

2021-08-20 Thread Paul Giralt (pgiralt)
ng about servers that need to be completed - not sure. -Paul > On Aug 20, 2021, at 12:16 PM, Eugen Block wrote: > > What is the output of 'ceph orch upgrade status'? Did you (maybe > accidentally) start an update? You can stop it with 'ceph orch upgrade stop'. > >

[ceph-users] Ceph status shows 'updating'

2021-08-20 Thread Paul Giralt (pgiralt)
The output of my ’ceph status’ shows the following: progress: Updating node-exporter deployment (-1 -> 14) (0s) [] Updating crash deployment (-1 -> 14) (0s) [] Updating crash deployment (-1 -> 14) (0s)

[ceph-users] Re: MTU mismatch error in Ceph dashboard

2021-08-06 Thread Paul Giralt (pgiralt)
metric ("node_network_mtu_bytes") in the text box and you'll get the latest values: As suggested, if you want to mute those alerts you can do that from the Cluster > Monitoring menu: Kind Regards, Ernesto On Wed, Aug 4, 2021 at 10:07 PM Paul Giralt (pgiralt) mailto:pgir...@cisco.com&

[ceph-users] Re: MTU mismatch error in Ceph dashboard

2021-08-04 Thread Paul Giralt (pgiralt)
change those and it will probably make the error go away. I’m guessing > something changed between 16.2.4 and 16.2.5 because I didn’t start seeing > this error until after the upgrade. > > -Paul > > >> On Aug 4, 2021, at 5:09 PM, Kai Stian Olstad wrote: >> >> On 04.0

[ceph-users] Re: MTU mismatch error in Ceph dashboard

2021-08-04 Thread Paul Giralt (pgiralt)
021, at 5:09 PM, Kai Stian Olstad wrote: > > On 04.08.2021 22:06, Paul Giralt (pgiralt) wrote: >> I did notice that docker0 has an MTU of 1500 as do the eno1 and eno2 >> interfaces which I’m not using. I’m not sure if that’s related to the >> error. I’ve been meaning to try

[ceph-users] Re: MTU mismatch error in Ceph dashboard

2021-08-04 Thread Paul Giralt (pgiralt)
I’m seeing the same issue. I’m not familiar with where to access the “Prometheus UI”. Can you point me to some instructions on how to do this and I’ll gladly collect the output of that command. FWIW, here are the interfaces on my machine: 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group

[ceph-users] Re: Redeploy iSCSI Gateway fail - 167 returned from docker run

2021-06-02 Thread Paul Giralt (pgiralt)
iscsi.iscsi) the configuration appears to still be maintained. Where is all this configuration stored? Is there a way to completely remove it to start the iscsi gateways on a clean slate? -Paul > On Jun 1, 2021, at 8:05 PM, Paul Giralt (pgiralt) wrote: > > CEPH 16.2.4. I was having

[ceph-users] Redeploy iSCSI Gateway fail - 167 returned from docker run

2021-06-01 Thread Paul Giralt (pgiralt)
CEPH 16.2.4. I was having an issue where I put a server into maintenance mode and after doing so, the containers for the iSCSI gateway were not running, so I decided to do a redeploy of the service. This caused all the servers running iSCSI to get in a state where it looks like ceph orch was

[ceph-users] Unable to delete disk from iSCSI target

2021-06-01 Thread Paul Giralt (pgiralt)
I’m trying to delete a disk from an iSCSI target so that I can remove the image, but running into an issue. If I try to delete it from the CEPH dashboard, I just get an error saying that the DELETE timed out after 45 seconds. If I try to do it from gwcli, the command never returns: