On 8/5/2012 7:40 PM, Robert Hunt wrote:
On 06/08/12 14:20, Chris Little wrote:
Linux packagers apparently go the UCS-4 route, so I didn't notice any
issue with using the Language Tags. But trying the above on Windows
shows that the cygwin build and the builds from python.org (2.7 & 3.2)
all use UCS-2. So my script won't work correctly on Windows.
Not to worry, though. I'll just replace the Language Tags with
Noncharacters in the range u+FDD0-u+FDEF. They're UCS-2-safe since
they're BMP codepoints and they're specifically designated as
"intended for process-internal uses, but are not permitted for
interchange." So in the unlikely event that they appear in input, it's
the fault of the USFM-encoder if anything goes awry.
We'll have to watch for input outside of the BMP on UCS-2 Python,
though, as that could cause problems.
I guess I'm quite surprised that you wrote a new Python program using
Python2 when its development is basically coming to an end (and the next
Ubuntu will no longer have it installed by default).
Python 2.x is better supported than 3 by libraries, including some I may
elect to use at a later date. I know Python 2.x well and have never seen
a need to learn 3, and if Python 2.x suits my needs, there's no reason
to jump to 3. 2to3 might work fine on my app, for all I know.
Python 2.7 not being on the Ubunutu desktop CD doesn't really matter.
Python 2.7 will still be available via apt-get, and 'python' will still
refer to Python 2.7.
I also wonder if
Python3 would handle Unicode better.
Yes and no, but as far as this specific issue goes, no. UCS-2 is still
the default internal representation in Python 3 and hence is what
everyone will have available to them in Python 3 on Windows (as I
mentioned in the first quoted paragraph above).
--Chris
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page