Public bug reported:

Description
===========

There are two functions in the code, which call (indirectly) RetrievePropertyEx 
with at most vmware.maximum_objects per result, but do not iterate over the 
results if there are more, nor does it 
By default the value is 100, which is usually sufficient, but if someone should 
set the value lower, it will cause the following problems.

Both are in the module nova.virt.vmwarapi.vm_util
1. get_all_cluster_mors
   The function is called to find the cluster reference for named cluster. If 
there are more clusters than maximum_objects, then there is a chance that the 
cluster won't be found by the driver despite being configured in the VCenter.

2. get_stats_from_cluster
   The function gets the statistics for all the hosts in a cluster. Since 
VSphere 7.0U1, the limit is 96 hosts per cluster, and quite likely the limit 
will increase past the default of 100, leaving the stats inaccurate. On top of 
it, the function does not call `cancel_retrieval` causing a leak in the VCenter.


Steps to reproduce
==================

Take any VCenter version with more than one cluster and more than one host (I 
would suggest three or more), and any release of Nova configured to run against 
the VCenter.
I am not 100% according what the order is by which the clusters are returned, 
possibly in the order they have been created, or alphabetically.  I would 
suggest to create first a cluster-a, and then a cluster-b, and add the hosts to 
cluster-b. That reflects how our clusters are created (chronologically 
alphabetically sorted).

* Configure additionally
  [vmware]
  maximum_objects=1
* Try to start nova-compute

Expected result
===============
The nova-compute service would start up and get the stats from all the hosts in 
the configured cluster.

Actual result
=============

nova-compute fails to start with the error message:
> The specified cluster '<clustername>' was not found in vCenter

If you have to clusters, and three hosts, then increasing
maximum_objects to two will get you around that failure, and will
trigger the second problem.

You can verify that by checking the resources of the nova-compute node,
which will report only the resources (CPUs,RAM...) of two of the ESXi-
hosts in the cluster.


Environment
===========
1. Exact version of OpenStack you are running. 
   370830e944 Merge "libvirt: Enable 'vmcoreinfo' feature by default"

   As far as I can see, all Nova releases are affected by this behavior.

2. Which hypervisor did you use?
   VMware VSphere
   What's the version of that?
   7.0.1-17327586

3. Which networking type did you use?
   Neutron with NSX-T (https://github.com/sapcc/networking-nsx-t)

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1940399

Title:
  vmware driver fails when setting vmware.maximum_objects to small value

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========

  There are two functions in the code, which call (indirectly) 
RetrievePropertyEx with at most vmware.maximum_objects per result, but do not 
iterate over the results if there are more, nor does it 
  By default the value is 100, which is usually sufficient, but if someone 
should set the value lower, it will cause the following problems.

  Both are in the module nova.virt.vmwarapi.vm_util
  1. get_all_cluster_mors
     The function is called to find the cluster reference for named cluster. If 
there are more clusters than maximum_objects, then there is a chance that the 
cluster won't be found by the driver despite being configured in the VCenter.

  2. get_stats_from_cluster
     The function gets the statistics for all the hosts in a cluster. Since 
VSphere 7.0U1, the limit is 96 hosts per cluster, and quite likely the limit 
will increase past the default of 100, leaving the stats inaccurate. On top of 
it, the function does not call `cancel_retrieval` causing a leak in the VCenter.

  
  Steps to reproduce
  ==================

  Take any VCenter version with more than one cluster and more than one host (I 
would suggest three or more), and any release of Nova configured to run against 
the VCenter.
  I am not 100% according what the order is by which the clusters are returned, 
possibly in the order they have been created, or alphabetically.  I would 
suggest to create first a cluster-a, and then a cluster-b, and add the hosts to 
cluster-b. That reflects how our clusters are created (chronologically 
alphabetically sorted).

  * Configure additionally
    [vmware]
    maximum_objects=1
  * Try to start nova-compute

  Expected result
  ===============
  The nova-compute service would start up and get the stats from all the hosts 
in the configured cluster.

  Actual result
  =============

  nova-compute fails to start with the error message:
  > The specified cluster '<clustername>' was not found in vCenter

  If you have to clusters, and three hosts, then increasing
  maximum_objects to two will get you around that failure, and will
  trigger the second problem.

  You can verify that by checking the resources of the nova-compute
  node, which will report only the resources (CPUs,RAM...) of two of the
  ESXi-hosts in the cluster.

  
  Environment
  ===========
  1. Exact version of OpenStack you are running. 
     370830e944 Merge "libvirt: Enable 'vmcoreinfo' feature by default"

     As far as I can see, all Nova releases are affected by this
  behavior.

  2. Which hypervisor did you use?
     VMware VSphere
     What's the version of that?
     7.0.1-17327586

  3. Which networking type did you use?
     Neutron with NSX-T (https://github.com/sapcc/networking-nsx-t)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1940399/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to