On 09:19 am, wolfgang....@rohdewald.de wrote:
This does not seem to be supported by Python yet.

Should that be enabled at all?
If one process with PY3 sends such identifiers to
a separate process with PY2, that will fail. I am not
sure if that would be a problem, whoever uses this must
make sure PY3 is used everywhere.

This is why we will *not* change the PB wire protocol as part of the porting work. The wire protocol will remain the same whether you are using Python 2 or Python 3 to run your program or.

This is the point of a protocol, after all. It is to let two programs communicate with each other.
If this should be forbidden, I will add a test to
test_pb for this. And of course somebody should document that
somewhere. There more PEP3131 is used, the more users will
fall into this trap.

I'm not exactly sure what you mean here. Using unicode where only bytes are allowed is probably already forbidden throughout PB.

If this should be enabled (which I think is not difficult,
at least for pb):

At least the patch below will be needed (only for PY3),
maybe it is already sufficient. Given that nativeString
and networkString are always used (done that for pb).

networkString may then return bytes with the high bit set

Definitely not.
But since networkString is called in many places I want to ask and
make sure that it may really be changed this way.


https://twistedmatrix.com/documents/14.0.0/core/specifications/banana.html
does not speak against it, so I wonder why networkString has that limitation
to 7bit.

That is the sole purpose of `networkString`. It is a work-around for the inconvenient fact that Python changed the meaning of the string literal syntax from bytes to unicode.

concrete banana-encoded example, from modified test_pb: (the method name is getSimpleä)
test_pb still passes with patched nativeString/networkString (but I
only have one test for this so far, test_refcount).

b'\x07\x80\x07\x82message\x01\x81\x03\x82foo\x0b\x82getSimple\xc3\xa4\x01\x81\x01\x80\x05\x82tuple\x01\x80\n\x82dictionary'


diff --git twisted/python/compat.py twisted/python/compat.py
index 6f76c39..6919cf6 100644
--- twisted/python/compat.py
+++ twisted/python/compat.py
@@ -348,10 +348,9 @@ def nativeString(s):
        raise TypeError("%r is neither bytes nor unicode" % s)
    if _PY3:
        if isinstance(s, bytes):
-            return s.decode("ascii")
+            return s.decode("utf-8")
        else:
-            # Ensure we're limited to ASCII subset:
-            s.encode("ascii")
+            return s
    else:
        if isinstance(s, unicode):
            return s.encode("ascii")
@@ -428,7 +427,7 @@ if _PY3:
    def networkString(s):
        if not isinstance(s, unicode):
raise TypeError("Can only convert text to bytes on Python 3, I got %r" % (s,))
-        return s.encode('ascii')
+        return s.encode('utf-8')

    def networkChar(integer):
        """

This change definitely won't be acceptable. It completely removes the feature `networkString` exists to provide: verifying that strings that might be either unicode or bytes can still be implicitly combined into bytes.

Can you point out the specific places where you think PB needs to start using UTF-8 instead of ASCII? Those are the places that need to be fixed, not the underlying porting helpers they happen to use.

Jean-Paul

_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Reply via email to