[Yahoo-eng-team] [Bug 1368989] [NEW] service_update() should not set an RPC timeout longer than service.report_interval

Chris Friesen Fri, 12 Sep 2014 16:11:23 -0700

Public bug reported:

nova.servicegroup.drivers.db.DbDriver._report_state() is called every
service.report_interval seconds from a timer in order to periodically
report the service state.  It calls self.conductor_api.service_update().


If this ends up calling
nova.conductor.rpcapi.ConductorAPI.service_update(), it will do an RPC
call() to nova-conductor.

If anything happens to the RPC server (failover, switchover, etc.) by
default the RPC code will wait 60 seconds for a response (blocking the
timer-based calling of _report_state() in the meantime).  This is long
enough to cause the status in the database to get old enough that other
services consider this service to be "down".

Arguably, since we're going to call service_update( ) again in
service.report_interval seconds there's no reason to wait the full 60
seconds.  Instead, it would make sense to set the RPC timeout for the
service_update() call to to something slightly less than
service.report_interval seconds.

I've also submitted a related bug report
(https://bugs.launchpad.net/bugs/1368917) to improve RPC loss of
connection in general, but I expect that'll take a while to deal with
while this particular case can be handled much more easily.

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: compute

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1368989

Title:
  service_update() should not set an RPC timeout longer than
  service.report_interval

Status in OpenStack Compute (Nova):
  New

Bug description:
  nova.servicegroup.drivers.db.DbDriver._report_state() is called every
  service.report_interval seconds from a timer in order to periodically
  report the service state.  It calls
  self.conductor_api.service_update().

  If this ends up calling
  nova.conductor.rpcapi.ConductorAPI.service_update(), it will do an RPC
  call() to nova-conductor.

  If anything happens to the RPC server (failover, switchover, etc.) by
  default the RPC code will wait 60 seconds for a response (blocking the
  timer-based calling of _report_state() in the meantime).  This is long
  enough to cause the status in the database to get old enough that
  other services consider this service to be "down".

  Arguably, since we're going to call service_update( ) again in
  service.report_interval seconds there's no reason to wait the full 60
  seconds.  Instead, it would make sense to set the RPC timeout for the
  service_update() call to to something slightly less than
  service.report_interval seconds.

  I've also submitted a related bug report
  (https://bugs.launchpad.net/bugs/1368917) to improve RPC loss of
  connection in general, but I expect that'll take a while to deal with
  while this particular case can be handled much more easily.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1368989/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1368989] [NEW] service_update() should not set an RPC timeout longer than service.report_interval

Reply via email to