[Yahoo-eng-team] [Bug 1989232] Re: MultiAttachVolumeSwap fails or takes a long time to detach volume

2022-10-27 Thread Balazs Gibizer
Thanks Aboubacar for the well written bug report.

I agree that we have a race between the disconnect and the swap
operation. Both uses a lock but they use different locks so they can
overlap.

Failed case:

Sep 30 02:51:47  Lock "connect_volume" "released" by 
"os_brick.initiator.connectors.iscsi.ISCSIConnector.disconnect_volume" :: held 
0.156s {{(pid=2571444) inner 
/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:400}}
Sep 30 02:51:55  Lock "0d711b7b-4693-4a7e-9a94-ca4186b4a670" "released" by 
"nova.compute.manager.ComputeManager.swap_volume.._do_locked_swap_volume"
 :: held 153.400s {{(pid=2571444) inner 
/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:400}}

Successful case:

Sep 29 17:12:21 Lock "4aeaeb5d-295f-4149-9330-a016d9da1730" "released" by 
"nova.compute.manager.ComputeManager.swap_volume.._do_locked_swap_volume"
 :: held 632.783s {{(pid=2571444) inner 
/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:400}}
Sep 29 17:12:25 Lock "connect_volume" "released" by 
"os_brick.initiator.connectors.iscsi.ISCSIConnector.disconnect_volume" :: held 
0.142s {{(pid=2571444) inner 
/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:400}}


** Also affects: os-brick
   Importance: Undecided
   Status: New

** Changed in: nova
   Status: New => Triaged

** Changed in: nova
   Importance: Undecided => Medium

** Tags added: volumes

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1989232

Title:
  MultiAttachVolumeSwap fails or takes a long time to detach volume

Status in OpenStack Compute (nova):
  Triaged
Status in os-brick:
  New
Status in tempest:
  New

Bug description:
  ERROR:
  
tempest.api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap.test_volume_swap_with_multiattach
  fails during tempest iSCSI tests due to volume taking a long time to
  detach or failing to detach from instance.  The logs herein show an
  example of a failure to detach.

  EXEPCTED BEHAVIOR: Volume successfully detaches and test passes.

  HOW TO DUPLICATE:
  Run: tox -e all -- 
tempest.api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap.test_volume_swap_with_multiattach
 | tee -a console.log.out

  CONFIG:
  - DevStack Zed Release
  - Single node using iSCSI
  - Host OS: Ubuntu 20.04
  Distributor ID: Ubuntu
  Description:Ubuntu 20.04.3 LTS
  Release:20.04
  Codename:   focal

  From tempest console.log:

  
tempest.api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap.test_volume_swap_with_multiattach[id-e8f
  8f9d1-d7b7-4cd2-8213-ab85ef697b6e,slow,volume]
  
-
  --

  Captured traceback:
  ~~~
  Traceback (most recent call last):

    File "/opt/stack/tempest/tempest/lib/decorators.py", line 81, in wrapper
  return f(*func_args, **func_kwargs)

    File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 70, in 
wrapper
  return f(*func_args, **func_kwargs)

    File 
"/opt/stack/tempest/tempest/api/compute/admin/test_volume_swap.py", line 245, 
in test_volume_swap_
  with_multiattach
  waiters.wait_for_volume_resource_status(self.volumes_client,

    File "/opt/stack/tempest/tempest/common/waiters.py", line 301, in 
wait_for_volume_resource_status
  time.sleep(client.build_interval)

    File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/fixtures/_fixtures/timeout.py",
 line 52, in signal_handler
  raise TimeoutException()

  fixtures._fixtures.timeout.TimeoutException

  Captured traceback-1:
  ~
  Traceback (most recent call last):

    File "/opt/stack/tempest/tempest/common/waiters.py", line 385, in 
wait_for_volume_attachment_remove_from_server
  raise lib_exc.TimeoutException(message)

  tempest.lib.exceptions.TimeoutException: Request timed out
  Details: Volume a54c67b7-786e-4ba7-94ea-d1e0a722424a failed to detach from 
server 986b2dd5-542a-4344-a929-9ac7bbf35d7c within the required time (3600 s) 
from the compute API perspective

  In waiters.py:

  373 while any(volume for volume in volumes if volume['volumeId'] == 
volume_id):
  374 time.sleep(client.build_interval)
  375
  376 timed_out = int(time.time()) - start >= client.build_timeout
  377 if timed_out:
  378 console_output = 
client.get_console_output(server_id)['output']
  379 LOG.debug('Console output for %s\nbody=\n%s',
  380   server_id, console_output)
  381 message = ('Volume %s failed to detach from server %s 
within '
  382'the required time (%s s) from the compute API 
'
  383 

[Yahoo-eng-team] [Bug 1989232] Re: MultiAttachVolumeSwap fails or takes a long time to detach volume

2022-10-02 Thread Aboubacar Diare
** Also affects: nova
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1989232

Title:
  MultiAttachVolumeSwap fails or takes a long time to detach volume

Status in OpenStack Compute (nova):
  New
Status in tempest:
  New

Bug description:
  ERROR:
  
tempest.api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap.test_volume_swap_with_multiattach
  fails during tempest iSCSI tests due to volume taking a long time to
  detach or failing to detach from instance.  The logs herein show an
  example of a failure to detach.

  EXEPCTED BEHAVIOR: Volume successfully detaches and test passes.

  HOW TO DUPLICATE:
  Run: tox -e all -- 
tempest.api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap.test_volume_swap_with_multiattach
 | tee -a console.log.out

  CONFIG:
  - DevStack Zed Release
  - Single node using iSCSI
  - Host OS: Ubuntu 20.04
  Distributor ID: Ubuntu
  Description:Ubuntu 20.04.3 LTS
  Release:20.04
  Codename:   focal

  From tempest console.log:

  
tempest.api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap.test_volume_swap_with_multiattach[id-e8f
  8f9d1-d7b7-4cd2-8213-ab85ef697b6e,slow,volume]
  
-
  --

  Captured traceback:
  ~~~
  Traceback (most recent call last):

    File "/opt/stack/tempest/tempest/lib/decorators.py", line 81, in wrapper
  return f(*func_args, **func_kwargs)

    File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 70, in 
wrapper
  return f(*func_args, **func_kwargs)

    File 
"/opt/stack/tempest/tempest/api/compute/admin/test_volume_swap.py", line 245, 
in test_volume_swap_
  with_multiattach
  waiters.wait_for_volume_resource_status(self.volumes_client,

    File "/opt/stack/tempest/tempest/common/waiters.py", line 301, in 
wait_for_volume_resource_status
  time.sleep(client.build_interval)

    File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/fixtures/_fixtures/timeout.py",
 line 52, in signal_handler
  raise TimeoutException()

  fixtures._fixtures.timeout.TimeoutException

  Captured traceback-1:
  ~
  Traceback (most recent call last):

    File "/opt/stack/tempest/tempest/common/waiters.py", line 385, in 
wait_for_volume_attachment_remove_from_server
  raise lib_exc.TimeoutException(message)

  tempest.lib.exceptions.TimeoutException: Request timed out
  Details: Volume a54c67b7-786e-4ba7-94ea-d1e0a722424a failed to detach from 
server 986b2dd5-542a-4344-a929-9ac7bbf35d7c within the required time (3600 s) 
from the compute API perspective

  In waiters.py:

  373 while any(volume for volume in volumes if volume['volumeId'] == 
volume_id):
  374 time.sleep(client.build_interval)
  375
  376 timed_out = int(time.time()) - start >= client.build_timeout
  377 if timed_out:
  378 console_output = 
client.get_console_output(server_id)['output']
  379 LOG.debug('Console output for %s\nbody=\n%s',
  380   server_id, console_output)
  381 message = ('Volume %s failed to detach from server %s 
within '
  382'the required time (%s s) from the compute API 
'
  383'perspective' %
  384(volume_id, server_id, client.build_timeout))
  385 raise lib_exc.TimeoutException(message)
  386 try:
  387 volumes = client.list_volume_attachments(
  388 server_id)['volumeAttachments']
  389 except lib_exc.NotFound:
  390 # Ignore 404s on detach in case the server is deleted or 
the volume
  391 # is already detached.
  392 return
  393 return

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1989232/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp