[Yahoo-eng-team] [Bug 1949051] [NEW] nova compute service running IronicDriver maybe leak memory

Simon Li Thu, 28 Oct 2021 00:51:08 -0700

Public bug reported:

Description
===========
We run nova-compute service with IronicDriver in k8s cluster as statefulSet 
pod, with 1GiB memory limit and only this service in POD. 
There are about 40 nodes in our test environment. Most of them have instances 
and in active provision state. 
Some nodes fail to connect to the IPMI. As a result, they cannot obtain the 
power status.
In about 12 hours, the memory limit is exceeded and the POD is restarted.


Steps to reproduce
==================
Nothing need to do.
Note the flowing:
1. The more nodes there are, the faster the memory grows and the shorter the 
time limit is exceeded.
2. Even with only one node, the memory limit will be exceeded, but for a long 
time.
3. In our environment, the frequency of memory growth is around 10min, so we 
suspect that is caused by periodic task, maybe `_sync_power_states` task.
4. I am not sure whether the IPMI connection has any impact.


Expected result
===============
Memory of the pod should be in a stable state when we are not performing 
operations on nodes/instances.

Actual result
=============
Memory keeps increasing until the limit is exceeded and the POD is restarted.

Environment
===========
openstack version
   - nova: 22.0.1
   - ironic: 16-0-1

Logs & Configs
==============

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1949051

Title:
  nova compute service running IronicDriver  maybe leak memory

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========
  We run nova-compute service with IronicDriver in k8s cluster as statefulSet 
pod, with 1GiB memory limit and only this service in POD. 
  There are about 40 nodes in our test environment. Most of them have instances 
and in active provision state. 
  Some nodes fail to connect to the IPMI. As a result, they cannot obtain the 
power status.
  In about 12 hours, the memory limit is exceeded and the POD is restarted.

  Steps to reproduce
  ==================
  Nothing need to do.
  Note the flowing:
  1. The more nodes there are, the faster the memory grows and the shorter the 
time limit is exceeded.
  2. Even with only one node, the memory limit will be exceeded, but for a long 
time.
  3. In our environment, the frequency of memory growth is around 10min, so we 
suspect that is caused by periodic task, maybe `_sync_power_states` task.
  4. I am not sure whether the IPMI connection has any impact.

  
  Expected result
  ===============
  Memory of the pod should be in a stable state when we are not performing 
operations on nodes/instances.

  Actual result
  =============
  Memory keeps increasing until the limit is exceeded and the POD is restarted.

  Environment
  ===========
  openstack version
     - nova: 22.0.1
     - ironic: 16-0-1

  Logs & Configs
  ==============

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1949051/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1949051] [NEW] nova compute service running IronicDriver maybe leak memory

Reply via email to