Nir Soffer has posted comments on this change.

Change subject: logging: Don't crash on non-ASCII in SimpleLogAdapter
......................................................................


Patch Set 2: Code-Review-1

(10 comments)

https://gerrit.ovirt.org/#/c/48542/2//COMMIT_MSG
Commit Message:

Line 8: 
Line 9: The values and messages passed to SimpleLogAdapter can be mixtures of
Line 10: strings and unicodes containing various non-ASCII characters.  If they
Line 11: are not properly handled, concatenation of incompatible strings raises
Line 12: UnicodeDecodeError.
Please show the failure in the log - it is not clear which code lead to this 
error.

Looks like you are not fixing the real issue.
Line 13: 
Line 14: We avoid that problem in SimpleLogAdapter by converting all the
Line 15: processed texts to unicodes before joining them.  We assume non-ASCII
Line 16: strings are in UTF-8 but if they are not then we shouldn't crash at


Line 11: are not properly handled, concatenation of incompatible strings raises
Line 12: UnicodeDecodeError.
Line 13: 
Line 14: We avoid that problem in SimpleLogAdapter by converting all the
Line 15: processed texts to unicodes before joining them.  We assume non-ASCII
This is not correct any more, please update the description.
Line 16: strings are in UTF-8 but if they are not then we shouldn't crash at
Line 17: least.
Line 18: 
Line 19: Change-Id: I1930817ade1799b218913348ffa099aab2211ef7


Line 17: least.
Line 18: 
Line 19: Change-Id: I1930817ade1799b218913348ffa099aab2211ef7
Line 20: Bug-Url: https://bugzilla.redhat.com/1276906
Line 21: Bug-Url: https://bugzilla.redhat.com/1260131
These issues should be fixed in the engine side. Why should vdsm support input 
that is not supported by libvirt?

If vdsm get non-ascii string from engine using unknown encoding, it should 
decode the string on input, not delay the decoding to the last minute during 
logging. Please find the place when we get invalid input and do the decoding 
there.


https://gerrit.ovirt.org/#/c/48542/2/tests/utilsTests.py
File tests/utilsTests.py:

Line 1085:         invalid = chr(254) + chr(0)
Line 1086:         u_invalid = unichr(253) + unichr(254)
Line 1087:         extra = dict(e1=u'Hello', e2='Здра́вствуйте', e3=u'שלום', 
e4='你好',
Line 1088:                      e5=invalid, e6=u_invalid)
Line 1089:         log = logUtils.SimpleLogAdapter(self._log, extra)
You should test this:

        log = logUtils.SimpleLogAdapter(self._log, {"key": value})
        msg, kwargs = log.process("msg", {"k": "v"})

        self.assertEqual(msg, "key=%s::msg" % encoded_value)
        self.assertEqual(kwargs, {"k": "v"})

value and encoded_value should come from permutations list, failure to handle 
unicode will fail one test, but if ascii string is handled correct, its test 
will pass.

    @permutations([
        # value, encoded_value
        (unicode("\xd7\90", "utf8"), "\xd7\90"),
        ("ascii", "ascii"),
        # add more inputs if needed
    ])
    def test_process(self, value, encoded_value):
        ...
Line 1090:         log.debug("Dobrý večer")
Line 1091:         log.debug(u"Dobrý večer")
Line 1092:         log.debug(invalid)


Line 1086:         u_invalid = unichr(253) + unichr(254)
Line 1087:         extra = dict(e1=u'Hello', e2='Здра́вствуйте', e3=u'שלום', 
e4='你好',
Line 1088:                      e5=invalid, e6=u_invalid)
Line 1089:         log = logUtils.SimpleLogAdapter(self._log, extra)
Line 1090:         log.debug("Dobrý večer")
Again, this is invalid test. Format strings are part of the code and using 
non-ascci format strings is not allowed.
Line 1091:         log.debug(u"Dobrý večer")
Line 1092:         log.debug(invalid)


Line 1087:         extra = dict(e1=u'Hello', e2='Здра́вствуйте', e3=u'שלום', 
e4='你好',
Line 1088:                      e5=invalid, e6=u_invalid)
Line 1089:         log = logUtils.SimpleLogAdapter(self._log, extra)
Line 1090:         log.debug("Dobrý večer")
Line 1091:         log.debug(u"Dobrý večer")
Same
Line 1092:         log.debug(invalid)


Line 1088:                      e5=invalid, e6=u_invalid)
Line 1089:         log = logUtils.SimpleLogAdapter(self._log, extra)
Line 1090:         log.debug("Dobrý večer")
Line 1091:         log.debug(u"Dobrý večer")
Line 1092:         log.debug(invalid)
Same


Line 1089:         log = logUtils.SimpleLogAdapter(self._log, extra)
Line 1090:         log.debug("Dobrý večer")
Line 1091:         log.debug(u"Dobrý večer")
Line 1092:         log.debug(invalid)
Line 1093:         log.debug(u_invalid)
Same


https://gerrit.ovirt.org/#/c/48542/2/vdsm/logUtils.py
File vdsm/logUtils.py:

Line 92:     def process(self, msg, kwargs):
Line 93:         result = ''
Line 94:         for key, value in self.extra.iteritems():
Line 95:             if isinstance(value, unicode):
Line 96:                 value = value.encode('utf-8', 'replace')
No need to replace, all unicode code points can be encoded as utt8 (or any 
other "utf" encoding).
See https://en.wikipedia.org/wiki/UTF-8

Searching for usage of extra dict, there is no code using non-ascii values:

    $ git grep -A2 'SimpleLogAdapter('
    vdsm/logUtils.py:class SimpleLogAdapter(logging.LoggerAdapter):
    vdsm/logUtils.py-    # Because of how python implements the fact that 
warning
    vdsm/logUtils.py-    # and warn are the same. I need to reimplement it 
here. :(
    --
    vdsm/storage/resourceManager.py:        self._log = 
SimpleLogAdapter(self._log, {"ResName": self.fullName,
    vdsm/storage/resourceManager.py-                                            
     "ReqID": self.reqID})
    vdsm/storage/resourceManager.py-
    --
    vdsm/storage/resourceManager.py:        self._log = 
SimpleLogAdapter(self._log, {"ResName": self.fullName,
    vdsm/storage/resourceManager.py-                                            
     "ResRefID": resRefID})
    vdsm/storage/resourceManager.py-
    --
    vdsm/storage/task.py:        self.log = SimpleLogAdapter(self.log, {"Task": 
self.id})
    vdsm/storage/task.py-
    vdsm/storage/task.py-    def __del__(self):
    --
    vdsm/virt/vm.py:        self.log = SimpleLogAdapter(self.log, {"vmId": 
self.conf['vmId']})
    vdsm/virt/vm.py-        self._destroyed = False
    vdsm/virt/vm.py-        self._recoveryFile = constants.P_VDSM_RUN + \

So supporting unicode values in the extra dict is simply not needed with 
current code.
Line 97:             result += '%s=`%s`' % (key, value)
Line 98:         if isinstance(msg, unicode):
Line 99:             msg = msg.encode('utf-8', 'replace')
Line 100:         result += '::%s' % msg


Line 95:             if isinstance(value, unicode):
Line 96:                 value = value.encode('utf-8', 'replace')
Line 97:             result += '%s=`%s`' % (key, value)
Line 98:         if isinstance(msg, unicode):
Line 99:             msg = msg.encode('utf-8', 'replace')
Please show a log example where logging fails because logging message contains 
non-ascii characters.
Line 100:         result += '::%s' % msg
Line 101:         return (result, kwargs)
Line 102: 
Line 103: 


-- 
To view, visit https://gerrit.ovirt.org/48542
To unsubscribe, visit https://gerrit.ovirt.org/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1930817ade1799b218913348ffa099aab2211ef7
Gerrit-PatchSet: 2
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Milan Zamazal <mzama...@redhat.com>
Gerrit-Reviewer: Francesco Romani <from...@redhat.com>
Gerrit-Reviewer: Jenkins CI
Gerrit-Reviewer: Milan Zamazal <mzama...@redhat.com>
Gerrit-Reviewer: Nir Soffer <nsof...@redhat.com>
Gerrit-Reviewer: Vinzenz Feenstra <vfeen...@redhat.com>
Gerrit-Reviewer: Yaniv Bronhaim <ybron...@redhat.com>
Gerrit-Reviewer: gerrit-hooks <automat...@ovirt.org>
Gerrit-HasComments: Yes
_______________________________________________
vdsm-patches mailing list
vdsm-patches@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-patches

Reply via email to