[Yahoo-eng-team] [Bug 2036877] [NEW] radvd seems to crash when ipv4 addresses are supplied as nameservers to ipv6 subnets

2023-09-21 Thread Sven Kieske
Public bug reported:

I'll copy from this report, please notice that I'm NOT the original
reporter!:

https://bugs.launchpad.net/kolla-ansible/+bug/2033980/comments/8

Before cleaning the PID file, I did take a look at the config of radvd:

```
$ cat /var/lib/neutron/ra/aee91f41-1945-40b4-b72f-8be2eb369b44.radvd.conf
interface qr-caa16d7e-26
{
   AdvSendAdvert on;
   MinRtrAdvInterval 30;
   MaxRtrAdvInterval 100;
   AdvLinkMTU 1450;

   RDNSS 2a02:74a0:x:0::53 10.40.3.53 2a02:74a0:x:0::54 {};

   prefix 2a02:74a0:x:y::/64
   {
AdvOnLink on;
AdvAutonomous on;
   };

   route fe80::a9fe:a9fe/128 {
   };
};
```

We've been configuring the router with terraform, assigning the ipv4
resolvers to the IPv4 subnet and the IPv6 resolvers to the IPv6 subnet.

After deleting the router, adjusting the subnets (no resolvers on v4,
only ipv6 resolvers on ipv6), and recreating the router, radvd is now
active and everything's fine.

It seems that due to misconfiguration (and incomplete template parsing),
IPv4 nameservers ended up in the config of radvd, which failed to start.
Neutron was then unable to clean up the pidfile, thus failing to start
radvd again.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2036877

Title:
  radvd seems to crash when ipv4 addresses are supplied as nameservers
  to ipv6 subnets

Status in neutron:
  New

Bug description:
  I'll copy from this report, please notice that I'm NOT the original
  reporter!:

  https://bugs.launchpad.net/kolla-ansible/+bug/2033980/comments/8

  Before cleaning the PID file, I did take a look at the config of
  radvd:

  ```
  $ cat /var/lib/neutron/ra/aee91f41-1945-40b4-b72f-8be2eb369b44.radvd.conf
  interface qr-caa16d7e-26
  {
 AdvSendAdvert on;
 MinRtrAdvInterval 30;
 MaxRtrAdvInterval 100;
 AdvLinkMTU 1450;

 RDNSS 2a02:74a0:x:0::53 10.40.3.53 2a02:74a0:x:0::54 {};

 prefix 2a02:74a0:x:y::/64
 {
  AdvOnLink on;
  AdvAutonomous on;
 };

 route fe80::a9fe:a9fe/128 {
 };
  };
  ```

  We've been configuring the router with terraform, assigning the ipv4
  resolvers to the IPv4 subnet and the IPv6 resolvers to the IPv6
  subnet.

  After deleting the router, adjusting the subnets (no resolvers on v4,
  only ipv6 resolvers on ipv6), and recreating the router, radvd is now
  active and everything's fine.

  It seems that due to misconfiguration (and incomplete template
  parsing), IPv4 nameservers ended up in the config of radvd, which
  failed to start. Neutron was then unable to clean up the pidfile, thus
  failing to start radvd again.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2036877/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2033980] Re: Neutron fails to respawn radvd due to corrupt pid file

2023-09-20 Thread Sven Kieske
This is not a bug in kolla-ansible

** Changed in: kolla-ansible
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2033980

Title:
  Neutron fails to respawn radvd due to corrupt pid file

Status in kolla-ansible:
  Invalid
Status in neutron:
  In Progress

Bug description:
  **Bug Report**

  What happened:

  I have had issues periodically where radvd seems to die and neutron is
  not able to respawn it. I'm not sure why it dies.

  In my neutron-l3-agent.log, the following error occurs once per
  minute:

  ```
  2023-09-03 14:37:07.514 16 ERROR neutron.agent.linux.utils [-] Unable to 
convert value in 
/var/lib/neutron/external/pids/ea759c71-0f4d-4be9-a761-83843ce04d9a.pid.radvd
  2023-09-03 14:37:07.514 16 ERROR neutron.agent.linux.external_process [-] 
radvd for router with uuid ea759c71-0f4d-4be9-a761-83843ce04d9a not found. The 
process should not have died
  2023-09-03 14:37:07.514 16 WARNING neutron.agent.linux.external_process [-] 
Respawning radvd for uuid ea759c71-0f4d-4be9-a761-83843ce04d9a
  2023-09-03 14:37:07.514 16 ERROR neutron.agent.linux.utils [-] Unable to 
convert value in 
/var/lib/neutron/external/pids/ea759c71-0f4d-4be9-a761-83843ce04d9a.pid.radvd
  2023-09-03 14:37:07.762 16 ERROR neutron.agent.linux.utils [-] Exit code: 
255; Cmd: ['ip', 'netns', 'exec', 
'qrouter-ea759c71-0f4d-4be9-a761-83843ce04d9a', 'env', 
'PROCESS_TAG=radvd-ea759c71-0f4d-4be9-a761-83843ce04d9a', 'radvd', '-C', 
'/var/lib/neutron/ra/ea759c71-0f4d-4be9-a761-83843ce04d9a.radvd.conf', '-p', 
'/var/lib/neutron/external/pids/ea759c71-0f4d-4be9-a761-83843ce04d9a.pid.radvd',
 '-m', 'syslog', '-u', 'neutron']; Stdin: ; Stdout: ; Stderr:
  ```

  Inspecting the pid file, it appears to have 2 pids, one on each line:

  ```
  $ docker exec -it neutron_l3_agent cat 
/var/lib/neutron/external/pids/ea759c71-0f4d-4be9-a761-83843ce04d9a.pid.radvd
  853
  1161
  ```

  Deleting the file then properly respawns radvd:

  ```
  2023-09-03 14:38:07.515 16 ERROR neutron.agent.linux.external_process [-] 
radvd for router with uuid ea759c71-0f4d-4be9-a761-83843ce04d9a not found. The 
process should not have died
  2023-09-03 14:38:07.516 16 WARNING neutron.agent.linux.external_process [-] 
Respawning radvd for uuid ea759c71-0f4d-4be9-a761-83843ce04d9a
  ```

  What you expected to happen:

  Radvd is respawned without needing manual intervention. Likely what is
  meant to happen is neutron should write the pid to the file, whereas
  instead it appends it. I'm not sure if this is a kolla issue or a
  neutron issue.

  How to reproduce it (minimal and precise): Unsure, I'm not sure how
  radvd ends up dying in the first place. You could likely reproduce
  this by deploying kolla-ansible and then manually killing radvd.

  **Environment**:
  * OS (e.g. from /etc/os-release):
  NAME="Rocky Linux"
  VERSION="9.2 (Blue Onyx)"
  ID="rocky"
  ID_LIKE="rhel centos fedora"
  VERSION_ID="9.2"
  PLATFORM_ID="platform:el9"
  PRETTY_NAME="Rocky Linux 9.2 (Blue Onyx)"
  ANSI_COLOR="0;32"
  LOGO="fedora-logo-icon"
  CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
  HOME_URL="https://rockylinux.org/;
  BUG_REPORT_URL="https://bugs.rockylinux.org/;
  SUPPORT_END="2032-05-31"
  ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
  ROCKY_SUPPORT_PRODUCT_VERSION="9.2"
  REDHAT_SUPPORT_PRODUCT="Rocky Linux"
  REDHAT_SUPPORT_PRODUCT_VERSION="9.2"

  * Kernel (e.g. `uname -a`):
  Linux lon1 5.14.0-284.25.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Aug 2 
14:53:30 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

  * Docker version if applicable (e.g. `docker version`):
  Client: Docker Engine - Community
   Version:   24.0.5
   API version:   1.43
   Go version:go1.20.6
   Git commit:ced0996
   Built: Fri Jul 21 20:36:54 2023
   OS/Arch:   linux/amd64
   Context:   default

  Server: Docker Engine - Community
   Engine:
Version:  24.0.5
API version:  1.43 (minimum version 1.12)
Go version:   go1.20.6
Git commit:   a61e2b4
Built:Fri Jul 21 20:35:17 2023
OS/Arch:  linux/amd64
Experimental: false
   containerd:
Version:  1.6.22
GitCommit:8165feabfdfe38c65b599c4993d227328c231fca
   runc:
Version:  1.1.8
GitCommit:v1.1.8-0-g82f18fe
   docker-init:
Version:  0.19.0
GitCommit:de40ad0

  * Kolla-Ansible version (e.g. `git head or tag or stable branch` or pip 
package version if using release):
  16.1.0 (stable/2023.1)

  * Docker image Install type (source/binary): Default installed by 
kolla-ansible
  * Docker image distribution: rocky
  * Are you using official images from Docker Hub or self built? official
  * If self built - Kolla version and environment used to build: not applicable
  * Share your inventory file, globals.yml and other configuration files if 

[Yahoo-eng-team] [Bug 1452641] Re: Static Ceph mon IP addresses in connection_info can prevent VM startup

2023-07-11 Thread Sven Kieske
billy-olsen: from the launchpad bug status description: "invalid" means:
"the report describes the software's normal behaviour, or is unsuitable
for any other reason."

I'd argue that it is not normal behaviour but a missing feature or a
bug, depending on how you look at it.

imho it's clearly a nova shortcoming.

if nova doesn't want this to be fixed please close it as "won't fix"
instead.

thanks.

** Changed in: nova
   Status: Invalid => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1452641

Title:
  Static Ceph mon IP addresses in connection_info can prevent VM startup

Status in OpenStack Compute (nova):
  Confirmed
Status in nova package in Ubuntu:
  Triaged

Bug description:
  The Cinder rbd driver extracts the IP addresses of the Ceph mon servers from 
the Ceph mon map when the instance/volume connection is established. This info 
is then stored in nova's block-device-mapping table and is never re-validated 
down the line. 
  Changing the Ceph mon servers' IP adresses will prevent the instance from 
booting as the stale connection info will enter the instance's XML. One idea to 
fix this would be to use the information from ceph.conf, which should be an 
alias or a loadblancer, directly.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1452641/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp