Nir Soffer has posted comments on this change. Change subject: logging: Don't crash on non-ASCII in SimpleLogAdapter ......................................................................
Patch Set 2: Code-Review-1 (10 comments) https://gerrit.ovirt.org/#/c/48542/2//COMMIT_MSG Commit Message: Line 8: Line 9: The values and messages passed to SimpleLogAdapter can be mixtures of Line 10: strings and unicodes containing various non-ASCII characters. If they Line 11: are not properly handled, concatenation of incompatible strings raises Line 12: UnicodeDecodeError. Please show the failure in the log - it is not clear which code lead to this error. Looks like you are not fixing the real issue. Line 13: Line 14: We avoid that problem in SimpleLogAdapter by converting all the Line 15: processed texts to unicodes before joining them. We assume non-ASCII Line 16: strings are in UTF-8 but if they are not then we shouldn't crash at Line 11: are not properly handled, concatenation of incompatible strings raises Line 12: UnicodeDecodeError. Line 13: Line 14: We avoid that problem in SimpleLogAdapter by converting all the Line 15: processed texts to unicodes before joining them. We assume non-ASCII This is not correct any more, please update the description. Line 16: strings are in UTF-8 but if they are not then we shouldn't crash at Line 17: least. Line 18: Line 19: Change-Id: I1930817ade1799b218913348ffa099aab2211ef7 Line 17: least. Line 18: Line 19: Change-Id: I1930817ade1799b218913348ffa099aab2211ef7 Line 20: Bug-Url: https://bugzilla.redhat.com/1276906 Line 21: Bug-Url: https://bugzilla.redhat.com/1260131 These issues should be fixed in the engine side. Why should vdsm support input that is not supported by libvirt? If vdsm get non-ascii string from engine using unknown encoding, it should decode the string on input, not delay the decoding to the last minute during logging. Please find the place when we get invalid input and do the decoding there. https://gerrit.ovirt.org/#/c/48542/2/tests/utilsTests.py File tests/utilsTests.py: Line 1085: invalid = chr(254) + chr(0) Line 1086: u_invalid = unichr(253) + unichr(254) Line 1087: extra = dict(e1=u'Hello', e2='Здра́вствуйте', e3=u'שלום', e4='你好', Line 1088: e5=invalid, e6=u_invalid) Line 1089: log = logUtils.SimpleLogAdapter(self._log, extra) You should test this: log = logUtils.SimpleLogAdapter(self._log, {"key": value}) msg, kwargs = log.process("msg", {"k": "v"}) self.assertEqual(msg, "key=%s::msg" % encoded_value) self.assertEqual(kwargs, {"k": "v"}) value and encoded_value should come from permutations list, failure to handle unicode will fail one test, but if ascii string is handled correct, its test will pass. @permutations([ # value, encoded_value (unicode("\xd7\90", "utf8"), "\xd7\90"), ("ascii", "ascii"), # add more inputs if needed ]) def test_process(self, value, encoded_value): ... Line 1090: log.debug("Dobrý večer") Line 1091: log.debug(u"Dobrý večer") Line 1092: log.debug(invalid) Line 1086: u_invalid = unichr(253) + unichr(254) Line 1087: extra = dict(e1=u'Hello', e2='Здра́вствуйте', e3=u'שלום', e4='你好', Line 1088: e5=invalid, e6=u_invalid) Line 1089: log = logUtils.SimpleLogAdapter(self._log, extra) Line 1090: log.debug("Dobrý večer") Again, this is invalid test. Format strings are part of the code and using non-ascci format strings is not allowed. Line 1091: log.debug(u"Dobrý večer") Line 1092: log.debug(invalid) Line 1087: extra = dict(e1=u'Hello', e2='Здра́вствуйте', e3=u'שלום', e4='你好', Line 1088: e5=invalid, e6=u_invalid) Line 1089: log = logUtils.SimpleLogAdapter(self._log, extra) Line 1090: log.debug("Dobrý večer") Line 1091: log.debug(u"Dobrý večer") Same Line 1092: log.debug(invalid) Line 1088: e5=invalid, e6=u_invalid) Line 1089: log = logUtils.SimpleLogAdapter(self._log, extra) Line 1090: log.debug("Dobrý večer") Line 1091: log.debug(u"Dobrý večer") Line 1092: log.debug(invalid) Same Line 1089: log = logUtils.SimpleLogAdapter(self._log, extra) Line 1090: log.debug("Dobrý večer") Line 1091: log.debug(u"Dobrý večer") Line 1092: log.debug(invalid) Line 1093: log.debug(u_invalid) Same https://gerrit.ovirt.org/#/c/48542/2/vdsm/logUtils.py File vdsm/logUtils.py: Line 92: def process(self, msg, kwargs): Line 93: result = '' Line 94: for key, value in self.extra.iteritems(): Line 95: if isinstance(value, unicode): Line 96: value = value.encode('utf-8', 'replace') No need to replace, all unicode code points can be encoded as utt8 (or any other "utf" encoding). See https://en.wikipedia.org/wiki/UTF-8 Searching for usage of extra dict, there is no code using non-ascii values: $ git grep -A2 'SimpleLogAdapter(' vdsm/logUtils.py:class SimpleLogAdapter(logging.LoggerAdapter): vdsm/logUtils.py- # Because of how python implements the fact that warning vdsm/logUtils.py- # and warn are the same. I need to reimplement it here. :( -- vdsm/storage/resourceManager.py: self._log = SimpleLogAdapter(self._log, {"ResName": self.fullName, vdsm/storage/resourceManager.py- "ReqID": self.reqID}) vdsm/storage/resourceManager.py- -- vdsm/storage/resourceManager.py: self._log = SimpleLogAdapter(self._log, {"ResName": self.fullName, vdsm/storage/resourceManager.py- "ResRefID": resRefID}) vdsm/storage/resourceManager.py- -- vdsm/storage/task.py: self.log = SimpleLogAdapter(self.log, {"Task": self.id}) vdsm/storage/task.py- vdsm/storage/task.py- def __del__(self): -- vdsm/virt/vm.py: self.log = SimpleLogAdapter(self.log, {"vmId": self.conf['vmId']}) vdsm/virt/vm.py- self._destroyed = False vdsm/virt/vm.py- self._recoveryFile = constants.P_VDSM_RUN + \ So supporting unicode values in the extra dict is simply not needed with current code. Line 97: result += '%s=`%s`' % (key, value) Line 98: if isinstance(msg, unicode): Line 99: msg = msg.encode('utf-8', 'replace') Line 100: result += '::%s' % msg Line 95: if isinstance(value, unicode): Line 96: value = value.encode('utf-8', 'replace') Line 97: result += '%s=`%s`' % (key, value) Line 98: if isinstance(msg, unicode): Line 99: msg = msg.encode('utf-8', 'replace') Please show a log example where logging fails because logging message contains non-ascii characters. Line 100: result += '::%s' % msg Line 101: return (result, kwargs) Line 102: Line 103: -- To view, visit https://gerrit.ovirt.org/48542 To unsubscribe, visit https://gerrit.ovirt.org/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1930817ade1799b218913348ffa099aab2211ef7 Gerrit-PatchSet: 2 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Milan Zamazal <mzama...@redhat.com> Gerrit-Reviewer: Francesco Romani <from...@redhat.com> Gerrit-Reviewer: Jenkins CI Gerrit-Reviewer: Milan Zamazal <mzama...@redhat.com> Gerrit-Reviewer: Nir Soffer <nsof...@redhat.com> Gerrit-Reviewer: Vinzenz Feenstra <vfeen...@redhat.com> Gerrit-Reviewer: Yaniv Bronhaim <ybron...@redhat.com> Gerrit-Reviewer: gerrit-hooks <automat...@ovirt.org> Gerrit-HasComments: Yes _______________________________________________ vdsm-patches mailing list vdsm-patches@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-patches