Hello Adam Litke, Dan Kenigsberg, Allon Mureinik,

I'd like you to do a code review.  Please visit

    https://gerrit.ovirt.org/41127

to review the following change.

Change subject: fileSD: Optimize getAllVolumes on file storage
......................................................................

fileSD: Optimize getAllVolumes on file storage

The previous implementation was using O(N^2) search to detect template
images and volumes. When working with storage domain with large number
of disks created from the same template, getAllVolumes becomes very
slow, delaying flows such as vm startup and recovery.

This patch reimplements getAllVolumes using O(N) search, cutting the
time to process 5000 volume paths from 9 to 0.065 seconds.

Profiling on scale setup show that getAllVolume cpu time per call
dropped from 44 to 1.3 seconds, and the total cpu time used for
preparing 20 images drop from 884 to 28 seconds.

Before this patch:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       ...
       20    0.153    0.008  884.109   44.205 hsm.py:3188(HSM.prepareImage)
       20    7.242    0.362  883.014   44.151 
fileSD.py:414(NfsStorageDomain.getAllVolumes)
   214840  866.080    0.004  866.080    0.004 fileSD.py:444(<genexpr>)
      ...

With this patch:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      ...
       20    0.128    0.006   28.302    1.415 hsm.py:3188(HSM.prepareImage)
       20   10.391    0.520   26.579    1.329 
fileSD.py:415(NfsStorageDomain.getAllVolumes)
     ...

Change-Id: Idc19ce272b7e0b9c91d2f90aa46b5ddd21d69ece
Bug-Url: https://bugzilla.redhat.com/1223053
Signed-off-by: Nir Soffer <nsof...@redhat.com>
Reviewed-on: http://gerrit.ovirt.org/36593
Reviewed-by: Allon Mureinik <amure...@redhat.com>
Reviewed-by: Adam Litke <ali...@redhat.com>
Reviewed-by: Federico Simoncelli <fsimo...@redhat.com>
Reviewed-by: Dan Kenigsberg <dan...@redhat.com>
---
A tests/fileSDTests.py
M vdsm/storage/fileSD.py
2 files changed, 205 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/27/41127/1

diff --git a/tests/fileSDTests.py b/tests/fileSDTests.py
new file mode 100644
index 0000000..39df65a
--- /dev/null
+++ b/tests/fileSDTests.py
@@ -0,0 +1,157 @@
+#
+# Copyright 2014 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301 USA
+#
+# Refer to the README and COPYING files for full details of the license
+#
+
+import fnmatch
+import os
+import time
+import uuid
+
+from testrunner import VdsmTestCase as TestCaseBase
+
+from storage import fileSD
+from storage import sd
+
+
+class TestingFileStorageDomain(fileSD.FileStorageDomain):
+
+    stat = None  # Accessed in __del__
+
+    def __init__(self, uuid, mountpoint, oop):
+        self.sdUUID = uuid
+        self.mountpoint = mountpoint
+        self._oop = oop
+
+    @property
+    def oop(self):
+        return self._oop
+
+
+class FakeGlob(object):
+
+    def __init__(self, files):
+        self.files = files
+
+    def glob(self, pattern):
+        return fnmatch.filter(self.files, pattern)
+
+
+class FakeOOP(object):
+
+    def __init__(self, glob=None):
+        self.glob = glob
+
+
+class GetAllVolumesTests(TestCaseBase):
+
+    MOUNTPOINT = "/rhev/data-center/%s" % uuid.uuid4()
+    SD_UUID = str(uuid.uuid4())
+    IMAGES_DIR = os.path.join(MOUNTPOINT, SD_UUID, sd.DOMAIN_IMAGES)
+
+    def test_no_volumes(self):
+        oop = FakeOOP(FakeGlob([]))
+        dom = TestingFileStorageDomain(self.SD_UUID, self.MOUNTPOINT, oop)
+        res = dom.getAllVolumes()
+        self.assertEqual(res, {})
+
+    def test_no_templates(self):
+        oop = FakeOOP(FakeGlob([
+            os.path.join(self.IMAGES_DIR, "image-1", "volume-1.meta"),
+            os.path.join(self.IMAGES_DIR, "image-1", "volume-2.meta"),
+            os.path.join(self.IMAGES_DIR, "image-1", "volume-3.meta"),
+            os.path.join(self.IMAGES_DIR, "image-2", "volume-4.meta"),
+            os.path.join(self.IMAGES_DIR, "image-2", "volume-5.meta"),
+            os.path.join(self.IMAGES_DIR, "image-3", "volume-6.meta"),
+        ]))
+        dom = TestingFileStorageDomain(self.SD_UUID, self.MOUNTPOINT, oop)
+        res = dom.getAllVolumes()
+
+        # These volumes should have parent uuid, but the implementation does
+        # not read the meta data files, so this info is not available (None).
+        self.assertEqual(res, {
+            "volume-1": (("image-1",), None),
+            "volume-2": (("image-1",), None),
+            "volume-3": (("image-1",), None),
+            "volume-4": (("image-2",), None),
+            "volume-5": (("image-2",), None),
+            "volume-6": (("image-3",), None),
+        })
+
+    def test_with_template(self):
+        oop = FakeOOP(FakeGlob([
+            os.path.join(self.IMAGES_DIR, "template-1", "volume-1.meta"),
+            os.path.join(self.IMAGES_DIR, "image-1", "volume-1.meta"),
+            os.path.join(self.IMAGES_DIR, "image-1", "volume-2.meta"),
+            os.path.join(self.IMAGES_DIR, "image-1", "volume-3.meta"),
+            os.path.join(self.IMAGES_DIR, "image-2", "volume-1.meta"),
+            os.path.join(self.IMAGES_DIR, "image-2", "volume-4.meta"),
+            os.path.join(self.IMAGES_DIR, "image-3", "volume-5.meta"),
+        ]))
+        dom = TestingFileStorageDomain(self.SD_UUID, self.MOUNTPOINT, oop)
+        res = dom.getAllVolumes()
+
+        self.assertEqual(len(res), 5)
+
+        # The template image must be first - we have code assuming this.
+        self.assertEqual(res["volume-1"].imgs[0], "template-1")
+
+        # The rest of the images have random order.
+        self.assertEqual(sorted(res["volume-1"].imgs[1:]),
+                         ["image-1", "image-2"])
+
+        # For template volumes we have parent info.
+        self.assertEqual(res["volume-1"].parent, sd.BLANK_UUID)
+
+        self.assertEqual(res["volume-2"], (("image-1",), None))
+        self.assertEqual(res["volume-3"], (("image-1",), None))
+        self.assertEqual(res["volume-4"], (("image-2",), None))
+        self.assertEqual(res["volume-5"], (("image-3",), None))
+
+    def test_scale(self):
+        # For this test we want real world strings
+        images_count = 5000
+        template_image_uuid = str(uuid.uuid4())
+        template_volume_uuid = str(uuid.uuid4())
+
+        files = []
+        template_volume = os.path.join(self.IMAGES_DIR, template_image_uuid,
+                                       template_volume_uuid + ".meta")
+        files.append(template_volume)
+
+        for i in range(images_count):
+            image_uuid = str(uuid.uuid4())
+            volume_uuid = str(uuid.uuid4())
+            template_volume = os.path.join(self.IMAGES_DIR, image_uuid,
+                                           template_volume_uuid + ".meta")
+            files.append(template_volume)
+            new_volume = os.path.join(self.IMAGES_DIR, image_uuid,
+                                      volume_uuid + ".meta")
+            files.append(new_volume)
+
+        oop = FakeOOP(FakeGlob(files))
+        dom = TestingFileStorageDomain(self.SD_UUID, self.MOUNTPOINT, oop)
+
+        start = time.time()
+        dom.getAllVolumes()
+        elapsed = time.time() - start
+        print "%f seconds" % elapsed
+
+        # This takes 0.065 seconds on my laptop, 1 second should be enough even
+        # on overloaded jenkins slave.
+        self.assertTrue(elapsed < 1.0, "Elapsed time: %f seconds" % elapsed)
diff --git a/vdsm/storage/fileSD.py b/vdsm/storage/fileSD.py
index b917301..9daf456 100644
--- a/vdsm/storage/fileSD.py
+++ b/vdsm/storage/fileSD.py
@@ -18,6 +18,7 @@
 # Refer to the README and COPYING files for full details of the license
 #
 
+import collections
 import os
 import errno
 import logging
@@ -415,41 +416,62 @@
         """
         Return dict {volUUID: ((imgUUIDs,), parentUUID)} of the domain.
 
-        Template self image is the 1st term in template volume entry images.
-        The parent can't be determined in file domain without reading the
-        metadata.
-        Setting parent = None for compatibility with block version.
+        (imgUUIDs,) is a tuple of all the images that contain a certain
+        volUUID.  For non-templates volumes, this tuple consists of a single
+        image.  For template volume it consists of all the images that are
+        based on the template volume. In that case, the first imgUUID in the
+        tuple is the self-image of the template.
+
+        The parent of a non-template volume cannot be determined in file domain
+        without reading  the metadata. However, in order to have an output
+        compatible to block domain, we report parent as None.
+
+        Template volumes have no parent, and thus we report BLANK_UUID as their
+        parentUUID.
         """
         volMetaPattern = os.path.join(self.mountpoint, self.sdUUID,
                                       sd.DOMAIN_IMAGES, "*", "*.meta")
         volMetaPaths = self.oop.glob.glob(volMetaPattern)
-        volumes = {}
+
+        # First create mapping from images to volumes
+        images = collections.defaultdict(list)
         for metaPath in volMetaPaths:
             head, tail = os.path.split(metaPath)
             volUUID, volExt = os.path.splitext(tail)
             imgUUID = os.path.basename(head)
-            if volUUID in volumes:
-                # Templates have no parents
-                volumes[volUUID]['parent'] = sd.BLANK_UUID
-                # Template volumes are hard linked in every image directory
-                # which is derived from that template, therefore:
-                # 1. a template volume which is in use will appear at least
-                # twice (in the template image dir and in the derived image
-                # dir)
-                # 2. Any volume which appears more than once in the dir tree is
-                # by definition a template volume.
-                # 3. Any image which has more than 1 volume is not a template
-                # image. Therefore if imgUUID appears in more than one path
-                # then it is not a template.
-                if len(tuple(vPath for vPath in volMetaPaths
-                             if imgUUID in vPath)) > 1:
-                    # Add template additonal image
-                    volumes[volUUID]['imgs'].append(imgUUID)
+            images[imgUUID].append(volUUID)
+
+        # Using images to volumes mapping, we can create volumes to images
+        # mapping, detecting template volumes and template images, based on
+        # these rules:
+        #
+        # Template volumes are hard linked in every image directory
+        # which is derived from that template, therefore:
+        #
+        # 1. A template volume which is in use will appear at least twice
+        #    (in the template image dir and in the derived image dir)
+        #
+        # 2. Any volume which appears more than once in the dir tree is
+        #    by definition a template volume.
+        #
+        # 3. Any image which has more than 1 volume is not a template
+        #    image.
+
+        volumes = {}
+        for imgUUID, volUUIDs in images.iteritems():
+            for volUUID in volUUIDs:
+                if volUUID in volumes:
+                    # This must be a template volume (rule 2)
+                    volumes[volUUID]['parent'] = sd.BLANK_UUID
+                    if len(volUUIDs) > 1:
+                        # This image is not a template (rule 3)
+                        volumes[volUUID]['imgs'].append(imgUUID)
+                    else:
+                        # This image is a template (rule 3)
+                        volumes[volUUID]['imgs'].insert(0, imgUUID)
                 else:
-                    # Insert at head the template self image
-                    volumes[volUUID]['imgs'].insert(0, imgUUID)
-            else:
-                volumes[volUUID] = {'imgs': [imgUUID], 'parent': None}
+                    volumes[volUUID] = {'imgs': [imgUUID], 'parent': None}
+
         return dict((k, sd.ImgsPar(tuple(v['imgs']), v['parent']))
                     for k, v in volumes.iteritems())
 


-- 
To view, visit https://gerrit.ovirt.org/41127
To unsubscribe, visit https://gerrit.ovirt.org/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Idc19ce272b7e0b9c91d2f90aa46b5ddd21d69ece
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: ovirt-3.5
Gerrit-Owner: Nir Soffer <nsof...@redhat.com>
Gerrit-Reviewer: Adam Litke <ali...@redhat.com>
Gerrit-Reviewer: Allon Mureinik <amure...@redhat.com>
Gerrit-Reviewer: Dan Kenigsberg <dan...@redhat.com>
_______________________________________________
vdsm-patches mailing list
vdsm-patches@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-patches

Reply via email to