On Wed, Sep 04, 2013 at 06:30:12AM -0700, Albert-Jan Roskam wrote: > Hi, > > I am trying to make my app work in Python 2.7 and Python 3.3 (one > codebase) and I might later also try to make it work on Python 2.6 and > Python 3.2 (if I am not too fed up with it ;-). I was very happy to > notice that the 'b' prefix for bytes objects is also supported for > byte strings in Python 2.7. Likewise, I just found out that the "u" > has been re-introduced in Python 3.3 (I believe this is the first > Python 3 version where this re-appears). > > I am now cursing myself for having used doctest for my tests.
Well that's just silly :-) Doc tests and unit tests have completely different purposes, and besides, in general doc tests are easier to make version independent. Many of your doc tests will still continue to work, and those that don't can almost always be adapted to be cross-platform. > So I am planning to rewrite everything in unittest. > Is the try-except block below the best way to make this test work in > Python 2.6 through 3.3? > > import unitttest > import blah # my package > > > class test_blah(unittest.TestCase): > def test_someTest(self): > try: > expected = [u"lalala", 1] # Python 2.6>= & Python 3.3>= > except SyntaxError: > expected = ["lalala", 1] # Python 3.0, 3.1, 3.2 That cannot work. try...except catches *run time* exceptions. SyntaxError occurs at *compile time*, before the try...except gets a chance to run. Unfortunately, there is no good way to write version-independent code involving strings across Python 2.x and 3.x. If you just support 3.3 and better, it is simple, but otherwise you're stuck with various nasty work arounds, none of which are ideal. Probably the least worst for your purposes is to create a helper function in your unit test: if version < '3': def u(astr): return unicode(astr) else: def u(astr): return astr Then, everywhere you want a Unicode string, use: u("something") The two problems with this are: 1) It is slightly slower, but for testing purposes that doesn't really matter; and 2) You cannot write non-ASCII literals in your strings. Or at least not safely. > Another, though related question. We have Python 2.7 in the office and > eventually we will move to some Python3 version. The code does not > generally need to remain Python2 compatible. What is the best > strategy: [a] write forward compatible code when using Python 2.7. > (using a 'b' prefix for byte strings, parentheses for the print > *statement*, sys.exc_info()[1] for error messages, etc.). [b] totally > rely on 2to3 script and don't make the Python2 code less reabable and > less idiomatic before the upgrade. Option a, but not the way you say it. Start by putting from __future__ import division, print_function at the top of your 2.x code. I don't believe there is a way to make string literals unicode, you just have to get used to writing u"" and b"" strings by hand. You can also do: from future_builtins import * range = xrange which will replace a bunch of builtins with Python3 compatible versions. Be prepared for a torrid time getting used to Unicode strings. Not because Unicode is hard, it isn't, but because you'll have to unlearn a lot of things that you thought you knew. The first thing to unlearn is this: there is no such thing as "plain text". Unfortunately Python 2 tries to be helpful when dealing with text versus bytes, and that actually teaches bad habits. This is the sort of thing I'm talking about: [steve@ando ~]$ python2.7 -c "print 'a' + u'b'" ab That sort of implicit conversion of bytes and strings is actually a bad idea, and Python 3 prohibits it: [steve@ando ~]$ python3.3 -c "print(b'a' + u'b')" Traceback (most recent call last): File "<string>", line 1, in <module> TypeError: can't concat bytes to str The other bad thing about Unicode is that, unless you are lucky enough to be generating all your own textual data, you'll eventually have to deal with cross-platform text issues, and text generated by people who didn't understand Unicode and therefore produce rubbish data containing mojibake and worse. But the good thing is, the Unicode model actually isn't hard to understand, and once you learn the language of "encodings", "code points" etc. it makes great sense. Unless you're working with binary data, you are much better off learning how to use Unicode u"" strings now. Just be aware that unless you are careful, Python 2 will try to be helpful, and you don't want that. -- Steven _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor