Hello Dan Kenigsberg, Francesco Romani, Martin Polednik,

I'd like you to do a code review.  Please visit

    https://gerrit.ovirt.org/59996

to review the following change.

Change subject: sampling: periodic: improve logging
......................................................................

sampling: periodic: improve logging

We received reports of incorrect monitoring, e.g. VMs set unresponsive,
or periodic monitoring being slow to get back to normal after a
disruption. Analysis of post-mortem logs provided nothing conclusive;
actually Vdsm seems to have reacted well under the disruption.
Still, we want to make sure everything is right.

This patch refactors the bulk stats sampling logging, to add more
information and to move it outside the critical section.

We know logging is a controversial matter.
Too much logging is unpractical and taxing over the host.
Too little logging makes troubleshooting very hard, especially for
monitoring issues that are most often detected and reported much after
the incident happen.

We still log once per operation, so four entries per minute, which is
definitely bearable.

Change-Id: I1e52a9c356be212ec72aace0f8d8aa5f1580ccb3
Bug-Url: https://bugzilla.redhat.com/1351247
Backport-To: 4.0
Backport-To: 3.6
Signed-off-by: Francesco Romani <from...@redhat.com>
Reviewed-on: https://gerrit.ovirt.org/58972
Reviewed-by: Milan Zamazal <mzama...@redhat.com>
Reviewed-by: Martin Polednik <mpoled...@redhat.com>
Continuous-Integration: Jenkins CI
Reviewed-by: Dan Kenigsberg <dan...@redhat.com>
---
M lib/vdsm/virt/sampling.py
1 file changed, 8 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/96/59996/1

diff --git a/lib/vdsm/virt/sampling.py b/lib/vdsm/virt/sampling.py
index b49666b..4cfd3b1 100644
--- a/lib/vdsm/virt/sampling.py
+++ b/lib/vdsm/virt/sampling.py
@@ -485,11 +485,13 @@
         self._log = logging.getLogger("virt.sampling.VMBulkSampler")
 
     def __call__(self):
+        log_status = True
         timestamp = self._stats_cache.clock()
         acquired = self._sampling.acquire(blocking=False)
         # we are deep in the hot path. bool(ExpiringCache)
         # *is* costly so we should avoid it if we can.
         fast_path = acquired and not self._skip_doms
+        doms = []  # whitelist, meaningful only in the slow path
         try:
             if fast_path:
                 # This is expected to be the common case.
@@ -499,7 +501,6 @@
                 # A previous call got stuck, or not every domain
                 # has properly recovered. Thus we must whitelist domains.
                 doms = self._get_responsive_doms()
-                self._log.debug('sampling %d domains', len(doms))
                 if doms:
                     bulk_stats = self._conn.domainListGetStats(
                         doms, self._stats_flags)
@@ -507,11 +508,17 @@
                     bulk_stats = []
         except Exception:
             self._log.exception("vm sampling failed")
+            log_status = False
         else:
             self._stats_cache.put(_translate(bulk_stats), timestamp)
         finally:
             if acquired:
                 self._sampling.release()
+        if log_status:
+            self._log.debug(
+                'sampled timestamp %r elapsed %.3f acquired %r domains %s',
+                timestamp,  self._stats_cache.clock() - timestamp, acquired,
+                'all' if fast_path else len(doms))
 
     def _get_responsive_doms(self):
         vms = self._get_vms()


-- 
To view, visit https://gerrit.ovirt.org/59996
To unsubscribe, visit https://gerrit.ovirt.org/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I1e52a9c356be212ec72aace0f8d8aa5f1580ccb3
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: ovirt-4.0
Gerrit-Owner: Milan Zamazal <mzama...@redhat.com>
Gerrit-Reviewer: Dan Kenigsberg <dan...@redhat.com>
Gerrit-Reviewer: Francesco Romani <from...@redhat.com>
Gerrit-Reviewer: Martin Polednik <mpoled...@redhat.com>
_______________________________________________
vdsm-patches mailing list
vdsm-patches@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/vdsm-patches@lists.fedorahosted.org

Reply via email to