Hello Piotr Kliczewski, Francesco Romani, Milan Zamazal,

I'd like you to do a code review.  Please visit

    https://gerrit.ovirt.org/51273

to review the following change.

Change subject: startup: Change system default encoding to utf8
......................................................................

startup: Change system default encoding to utf8

In Python 2, the system default encoding is 'ascii'. This causes mixing
of unicode and non-ascii strings (e.g. utf8 encoded) to fail with
UnicodeDecodeError or UnicodeEncodeError. The trigger for this failures
is starting using the built-in json library, that returns all values as
unicode strings, even if the value is ascii.

For example:

>>> u'ascii' + '\xd7\x90'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 0:
ordinal not in range(128)

Python tries to decode the second value implicitly, and fails since this
is a utf8 encoded string.

To avoid such issues, the entire application must be changed to use only
strings or only unicode internally, and never mix these types.  In the
distant future, when we run on Python 3, this will be true. For now, the
only way to prevent these issues systematically is to change the system
default encoding to 'utf8'.

Changing the default encoding is done with sys.setdefaultencoding(), but
this function exists only during startup, and can be called only from
sitecustomize module.

We keep now sitecustomize.py module, and change PYTHONPATH so it will be
loaded on startup.

This change is not needed in Python 3 since the default encoding is
already utf8, and Python does not do any implicit decoding or encoding
when mixing 'str' and 'bytes' types.

Change-Id: Icc3f072a499ba4034bdbedd09eb60d7a3a32f9c8
Bug-Url: https://bugzilla.redhat.com/1281940
Signed-off-by: Nir Soffer <[email protected]>
Reviewed-on: https://gerrit.ovirt.org/48661
Continuous-Integration: Jenkins CI
Reviewed-by: Milan Zamazal <[email protected]>
Reviewed-by: Francesco Romani <[email protected]>
Reviewed-by: Piotr Kliczewski <[email protected]>
---
M debian/vdsm.install
M init/daemonAdapter
M tests/Makefile.am
A tests/unicode_test.py
M vdsm.spec.in
M vdsm/Makefile.am
A vdsm/sitecustomize.py
7 files changed, 84 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/73/51273/1

diff --git a/debian/vdsm.install b/debian/vdsm.install
index 17ec403..8a659d6 100644
--- a/debian/vdsm.install
+++ b/debian/vdsm.install
@@ -134,6 +134,7 @@
 ./usr/share/vdsm/storage/threadPool.py
 ./usr/share/vdsm/storage/volume.py
 ./usr/share/vdsm/supervdsm.py
+./usr/share/vdsm/sitecustomize.py
 ./usr/share/vdsm/supervdsmServer
 ./usr/share/vdsm/v2v.py
 ./usr/share/vdsm/vdsm
diff --git a/init/daemonAdapter b/init/daemonAdapter
index a112af0..c67ff57 100755
--- a/init/daemonAdapter
+++ b/init/daemonAdapter
@@ -69,12 +69,20 @@
             os.nice(config.getint('vars', 'vdsm_nice'))
 
             env = os.environ.copy()
+
+            # Python path is needed to get sitecustomize.py imported from
+            # target directory.
+            pythonpath = os.path.dirname(self._args.target[0])
+            if 'PYTHONPATH' in env:
+                pythonpath += ':' + env['PYTHONPATH']
+
             env.update({
                 'LIBVIRT_LOG_FILTERS': config.get(
                     'vars', 'libvirt_env_variable_log_filters'),
                 'LIBVIRT_LOG_OUTPUTS': config.get(
                     'vars', 'libvirt_env_variable_log_outputs'),
                 'LC_ALL': 'C',
+                'PYTHONPATH': pythonpath,
             })
 
             cmd = [self._args.target[0]] + self._args.targetOptions
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 1b33ef8..a9836a1 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -104,6 +104,7 @@
        testlibTests.py \
        toolTests.py \
        transportWrapperTests.py \
+       unicode_test.py \
        utilsTests.py \
        vdscliTests.py \
        vdsClientTests.py \
diff --git a/tests/unicode_test.py b/tests/unicode_test.py
new file mode 100644
index 0000000..e6f6f35
--- /dev/null
+++ b/tests/unicode_test.py
@@ -0,0 +1,67 @@
+#
+# Copyright 2015 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301 USA
+#
+# Refer to the README and COPYING files for full details of the license
+#
+
+from testlib import VdsmTestCase
+from testlib import permutations, expandPermutations
+
+ENCODE = [
+    # value, encoded (utf8)
+    (u'\u05d0', '\xd7\x90'),
+    ('\xd7\x90', '\xd7\x90'),
+    (u'ascii', 'ascii'),
+    ('ascii', 'ascii'),
+]
+
+DECODE = [
+    # value (utf8), decoded
+    ('\xd7\x90', u'\u05d0'),
+    (u'\u05d0', u'\u05d0'),
+    ('ascii', u'ascii'),
+    (u'ascii', u'ascii'),
+]
+
+
+@expandPermutations
+class TestUnicode(VdsmTestCase):
+
+    @permutations(ENCODE)
+    def test_encode(self, value, encoded):
+        self.assertEquals(value.encode("utf8"), encoded)
+
+    @permutations(ENCODE)
+    def test_str(self, value, encoded):
+        self.assertEqual(str(value), encoded)
+
+    @permutations(DECODE)
+    def test_decode(self, value, decoded):
+        self.assertEquals(value.decode("utf8"), decoded)
+
+    @permutations(DECODE)
+    def test_unicode(self, value, decoded):
+        self.assertEquals(unicode(value), decoded)
+
+    def test_mix_add(self):
+        self.assertEquals(u'\u05d0' + '\xd7\x91', u'\u05d0\u05d1')
+
+    def test_mix_format_str(self):
+        self.assertEquals(u'\u05d0%s' % '\xd7\x91', u'\u05d0\u05d1')
+
+    def test_mix_format_unicode(self):
+        self.assertEquals('\xd7\x90%s' % u'\u05d1', u'\u05d0\u05d1')
diff --git a/vdsm.spec.in b/vdsm.spec.in
index 2c0325d..0dc6533 100644
--- a/vdsm.spec.in
+++ b/vdsm.spec.in
@@ -839,6 +839,7 @@
 %{_datadir}/%{vdsm_name}/ppc64HardwareInfo.py*
 %{_datadir}/%{vdsm_name}/protocoldetector.py*
 %{_datadir}/%{vdsm_name}/supervdsm.py*
+%{_datadir}/%{vdsm_name}/sitecustomize.py*
 %{_datadir}/%{vdsm_name}/supervdsmServer
 %{_datadir}/%{vdsm_name}/v2v.py*
 %{_datadir}/%{vdsm_name}/vdsm
diff --git a/vdsm/Makefile.am b/vdsm/Makefile.am
index 4c0578e..30fc15f 100644
--- a/vdsm/Makefile.am
+++ b/vdsm/Makefile.am
@@ -42,6 +42,7 @@
        ppc64HardwareInfo.py \
        protocoldetector.py \
        supervdsm.py \
+       sitecustomize.py \
        v2v.py \
        vdsmDebugPlugin.py \
        $(NULL)
diff --git a/vdsm/sitecustomize.py b/vdsm/sitecustomize.py
new file mode 100644
index 0000000..d698a8b
--- /dev/null
+++ b/vdsm/sitecustomize.py
@@ -0,0 +1,5 @@
+import sys
+
+if sys.version_info[0] == 2:
+    # Allow mixing of unicode objects and strings encoded in utf8.
+    sys.setdefaultencoding('utf8')


-- 
To view, visit https://gerrit.ovirt.org/51273
To unsubscribe, visit https://gerrit.ovirt.org/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Icc3f072a499ba4034bdbedd09eb60d7a3a32f9c8
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: ovirt-3.6
Gerrit-Owner: Nir Soffer <[email protected]>
Gerrit-Reviewer: Francesco Romani <[email protected]>
Gerrit-Reviewer: Milan Zamazal <[email protected]>
Gerrit-Reviewer: Piotr Kliczewski <[email protected]>
_______________________________________________
vdsm-patches mailing list
[email protected]
https://lists.fedorahosted.org/mailman/listinfo/vdsm-patches

Reply via email to