On Wed, Sep 04, 2013 at 06:30:12AM -0700, Albert-Jan Roskam wrote:
> Hi,
> 
> I am trying to make my app work in Python 2.7 and Python 3.3 (one 
> codebase) and I might later also try to make it work on Python 2.6 and 
> Python 3.2 (if I am not too fed up with it ;-). I was very happy to 
> notice that the 'b' prefix for bytes objects is also supported for 
> byte strings in Python 2.7. Likewise, I just found out that the "u" 
> has been re-introduced in Python 3.3 (I believe this is the first 
> Python 3 version where this re-appears).
> 
> I am now cursing myself for having used doctest for my tests. 

Well that's just silly :-)

Doc tests and unit tests have completely different purposes, and 
besides, in general doc tests are easier to make version independent. 
Many of your doc tests will still continue to work, and those that don't 
can almost always be adapted to be cross-platform.


> So I am planning to rewrite everything in unittest.
> Is the try-except block below the best way to make this test work in 
> Python 2.6 through 3.3?
> 
> import unitttest
> import blah  # my package
> 
> 
> class test_blah(unittest.TestCase):
>     def test_someTest(self):
>         try:
>             expected = [u"lalala", 1] # Python 2.6>= & Python 3.3>=
>         except SyntaxError:
>             expected = ["lalala", 1] # Python 3.0, 3.1, 3.2

That cannot work. try...except catches *run time* exceptions. 
SyntaxError occurs at *compile time*, before the try...except gets a 
chance to run.

Unfortunately, there is no good way to write version-independent code 
involving strings across Python 2.x and 3.x. If you just support 3.3 and 
better, it is simple, but otherwise you're stuck with various nasty work 
arounds, none of which are ideal.

Probably the least worst for your purposes is to create a helper 
function in your unit test:

if version < '3':
    def u(astr):
        return unicode(astr)
else:
    def u(astr):
        return astr

Then, everywhere you want a Unicode string, use:

u("something")

The two problems with this are:

1) It is slightly slower, but for testing purposes that doesn't really 
matter; and

2) You cannot write non-ASCII literals in your strings. Or at least not 
safely.


> Another, though related question. We have Python 2.7 in the office and 
> eventually we will move to some Python3 version. The code does not 
> generally need to remain Python2 compatible. What is the best 
> strategy: [a] write forward compatible code when using Python 2.7. 
> (using a 'b' prefix for byte strings, parentheses for the print 
> *statement*, sys.exc_info()[1] for error messages, etc.). [b] totally 
> rely on 2to3 script and don't make the Python2 code less reabable and 
> less idiomatic before the upgrade.

Option a, but not the way you say it. Start by putting 

from __future__ import division, print_function

at the top of your 2.x code. I don't believe there is a way to make 
string literals unicode, you just have to get used to writing u"" and 
b"" strings by hand.

You can also do:

from future_builtins import *
range = xrange

which will replace a bunch of builtins with Python3 compatible versions.

Be prepared for a torrid time getting used to Unicode strings. Not 
because Unicode is hard, it isn't, but because you'll have to unlearn a 
lot of things that you thought you knew. The first thing to unlearn is 
this: there is no such thing as "plain text".

Unfortunately Python 2 tries to be helpful when dealing with text versus 
bytes, and that actually teaches bad habits. This is the sort of thing 
I'm talking about:

[steve@ando ~]$ python2.7 -c "print 'a' + u'b'"
ab


That sort of implicit conversion of bytes and strings is actually a bad 
idea, and Python 3 prohibits it:

[steve@ando ~]$ python3.3 -c "print(b'a' + u'b')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: can't concat bytes to str


The other bad thing about Unicode is that, unless you are lucky enough 
to be generating all your own textual data, you'll eventually have to 
deal with cross-platform text issues, and text generated by people who 
didn't understand Unicode and therefore produce rubbish data containing 
mojibake and worse.

But the good thing is, the Unicode model actually isn't hard to 
understand, and once you learn the language of "encodings", "code 
points" etc. it makes great sense.

Unless you're working with binary data, you are much better off learning 
how to use Unicode u"" strings now. Just be aware that unless you are 
careful, Python 2 will try to be helpful, and you don't want that.


-- 
Steven
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to