[ceph-users] Re: Redeploy iSCSI Gateway fail - 167 returned from docker run

Paul Giralt (pgiralt) Tue, 01 Jun 2021 23:13:25 -0700

Ok - so looks like that 167 was a red herring. That was actually a valid 
result. The issue is the container was starting up then dying.

Looks like this all went downhill as a result of me changing the name of an 
image (I had named it iscsi-img-005 instead of iscsi-img-0005 to match all my 
other image names). Looks like iSCSI gateway does not like that. It worked fine 
after the rename, but when I redeployed the gateways, they never came back up. 
I saw an error in the logs that indicated it was having trouble with that name. 

I managed to get things back up and running on one gateway after a lot of 
messing around getting it to the point where I could delete the configuration 
that it didn’t like. Right now I’m running on one gateway fine. I tried to 
scale back up to 4 servers and the other three all have different issues. One 
comes up, but when I try to provision a target to use it, I get an out of index 
error. On the 3rd the docker containers come up, but the gateway never shows 
up. The 4th the containers never come up because it fails with this message: 

subprocess.CalledProcessError: Command 'ceph -n 
client.iscsi.iscsi.cxcto-c240-j27-05.noraaw --conf /etc/ceph/ceph.conf osd 
blacklist rm 10.122.242.200:6977/1317769556' returned non-zero exit status 13.

I feel like it would probably be best to just wipe all the iscsi gateway 
configuration and start from scratch with the iSCSI configuration piece - 
however even when I remove the service (ceph orch rm iscsi.iscsi) the 
configuration appears to still be maintained. 

Where is all this configuration stored? Is there a way to completely remove it 
to start the iscsi gateways on a clean slate? 

-Paul

> On Jun 1, 2021, at 8:05 PM, Paul Giralt (pgiralt) <pgir...@cisco.com> wrote:
> 
> CEPH 16.2.4. I was having an issue where I put a server into maintenance mode 
> and after doing so, the containers for the iSCSI gateway were not running, so 
> I decided to do a redeploy of the service. This caused all the servers 
> running iSCSI to get in a state where it looks like ceph orch was trying to 
> delete the container, but it was stuck. My only recourse was to reboot the 
> servers. I ended up doing a ‘ceph orch rm iscsi.iscsi’ to just remove the 
> services and then tried to redeploy. When I do this, I’m seeing the following 
> in the cephadm logs on the servers where the iscsi gateway is being deployed: 
> 
> 2021-06-01 19:48:15,110 INFO Deploy daemon 
> iscsi.iscsi.cxcto-c240-j27-02.zeypah ...
> 2021-06-01 19:48:15,111 DEBUG Running command: /bin/docker run --rm 
> --ipc=host --net=host --entrypoint stat --init -e 
> CONTAINER_IMAGE=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949
>  -e NODE_NAME=cxcto-c240-j27-02.cisco.com -e CEPH_USE_RANDOM_NONCE=1 
> docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949
>  -c %u %g /var/lib/ceph
> 2021-06-01 19:48:15,529 DEBUG stat: 167 167
> 
> Later in the logs I see: 
> 
> 2021-06-01 19:48:25,933 DEBUG Running command: /bin/docker inspect --format 
> {{.Id}},{{.Config.Image}},{{.Image}},{{.Created}},{{index .Config.Labels 
> "io.ceph.version"}} 
> ceph-a67d529e-ba7f-11eb-940b-5c838f8013a5-iscsi.iscsi.cxcto-c240-j27-02.zeypah
> 2021-06-01 19:48:25,984 DEBUG /bin/docker:
> 2021-06-01 19:48:25,984 DEBUG /bin/docker: Error: No such object: 
> ceph-a67d529e-ba7f-11eb-940b-5c838f8013a5-iscsi.iscsi.cxcto-c240-j27-02.zeypah
> 
> Obviously no such object because the container creation failed. 
> 
> If I try to run that command that is in the logs manually, I get: 
> 
> [root@cxcto-c240-j27-02 ceph]# /bin/docker run --rm --ipc=host --net=host 
> --entrypoint stat --init -e 
> CONTAINER_IMAGE=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949
>  -e NODE_NAME=cxcto-c240-j27-02.cisco.com -e CEPH_USE_RANDOM_NONCE=1 
> docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949
>  -c %u %g /var/lib/ceph
> stat: cannot stat '%g': No such file or directory
> 167
> 
> So the 167 seems to line up with what’s showing up in the script. I’m not 
> clear on what the deal is with the %g. What is supposed to be in that 
> placeholder? Any thoughts on why this is failing? 
> 
> Right now all my iSCSI gateways are down and basically my whole environment 
> is down as a result 🙁 
> 
> -Paul
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Redeploy iSCSI Gateway fail - 167 returned from docker run

Reply via email to