Re: [Tutor] why is unichr(sys.maxunicode) blank?

Steven D'Aprano Sat, 18 May 2013 18:42:15 -0700

On 19/05/13 02:45, Albert-Jan Roskam wrote about locales:

It is pretty sick that all these things can be adjusted separately (what is the 
use of having: danish collation, russian case conversion, english decimal sign, 
japanese codepage ;-)


Well obviously there is no point to such a mess, but the ability to make a mess 
comes from having the flexibility to have less silly combinations.

By the way, I'm not sure what you mean by "pretty sick", since in Australian slang "sick" can mean 
"fantastic, excellent", as in "Mate, that's a pretty sick sub-woofer!".

See http://www.youtube.com/watch?v=iRv7IE6T4gQ

(warning: ethnic stereotypes, low-brow humour)



[...]

  Isn't UCS-2 the internal unicode encoding for CPython (narrow builds)?


Narrow builds create UTF-16 surrogate pairs from \U literals, but
these aren't treated as an atomic unit for slicing, iteration, or
string length.


That is a nice way of putting it. So if you slice a multibyte char "mb", mb[0] 
will return the first byte? That is annoying.


Correct. You can easily break apart surrogate pairs in Python narrow builds, 
which leads to invalid strings. The solution is to either use a wide build, or 
upgrade to Python 3.3 which no longer has this problem:


# Python 3.2, narrow build:
py> len(chr(0x101001))
2

# Python 3.3
py> len(chr(0x101001))
1


--
Steven
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] why is unichr(sys.maxunicode) blank?

Reply via email to