[Yahoo-eng-team] [Bug 2052718] [NEW] Nova Compute Service status goes up and down abnormally

VO LE HUY Thu, 08 Feb 2024 08:53:35 -0800

Public bug reported:

Hi community,


When you type:
$ watch openstack nova compute service list

You will see the status of components such as "nova-conductor", "nova-
scheduler" and "nova-compute" continuously jump between "up" and "down",
this is not normal and it has absolutely no end. This happens and a
chain of functions are affected such as live-migrate, creating virtual
machines,... while checking, the compute server still works.

After some debugging I discovered that the current time "now" retrieved
from the system is smaller than the past time "last_heartbeat"!!! And
the mistake of not pointing this out in the source code cost me a lot of
time to understand. In fact, as soon as the current time is received,
there should be a comparison calculation with the last heartbeat. If it
is small, we need to log the warning to easily configure time
synchronization for the cluster.

Around the abs(elapsed) line of code ->
https://github.com/openstack/nova/blob/stable/2023.2/nova/servicegroup/drivers/db.py

...
...
def is_up(self, service_ref):
        ...
        ...
        # Timestamps in DB are UTC.
        elapsed = timeutils.delta_seconds(last_heartbeat, timeutils.utcnow())
        is_up = abs(elapsed) <= self.service_down_time
        if not is_up:
            LOG.debug('Seems service %(binary)s on host %(host)s is down. '
                      'Last heartbeat was %(lhb)s. Elapsed time is %(el)s',
                      {'binary': service_ref.get('binary'),
                       'host': service_ref.get('host'),
                       'lhb': str(last_heartbeat), 'el': str(elapsed)})
        return is_up
...
...

** Affects: nova
     Importance: Undecided
         Status: New

** Attachment added: "Up/down continuously"
   
https://bugs.launchpad.net/bugs/2052718/+attachment/5745271/+files/MicrosoftTeams-image.png

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2052718

Title:
  Nova Compute Service status goes up and down abnormally

Status in OpenStack Compute (nova):
  New

Bug description:
  Hi community,

  When you type:
  $ watch openstack nova compute service list

  You will see the status of components such as "nova-conductor", "nova-
  scheduler" and "nova-compute" continuously jump between "up" and
  "down", this is not normal and it has absolutely no end. This happens
  and a chain of functions are affected such as live-migrate, creating
  virtual machines,... while checking, the compute server still works.

  After some debugging I discovered that the current time "now"
  retrieved from the system is smaller than the past time
  "last_heartbeat"!!! And the mistake of not pointing this out in the
  source code cost me a lot of time to understand. In fact, as soon as
  the current time is received, there should be a comparison calculation
  with the last heartbeat. If it is small, we need to log the warning to
  easily configure time synchronization for the cluster.

  Around the abs(elapsed) line of code ->
  
https://github.com/openstack/nova/blob/stable/2023.2/nova/servicegroup/drivers/db.py

  ...
  ...
  def is_up(self, service_ref):
          ...
          ...
          # Timestamps in DB are UTC.
          elapsed = timeutils.delta_seconds(last_heartbeat, timeutils.utcnow())
          is_up = abs(elapsed) <= self.service_down_time
          if not is_up:
              LOG.debug('Seems service %(binary)s on host %(host)s is down. '
                        'Last heartbeat was %(lhb)s. Elapsed time is %(el)s',
                        {'binary': service_ref.get('binary'),
                         'host': service_ref.get('host'),
                         'lhb': str(last_heartbeat), 'el': str(elapsed)})
          return is_up
  ...
  ...

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2052718/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 2052718] [NEW] Nova Compute Service status goes up and down abnormally

Reply via email to