On Wed, Aug 15, 2012 at 9:20 AM, Manuel Kaufmann <humi...@gmail.com> wrote: > Oh, it's OK. I agree with the result. Now, let's check what Python say > if I use my default encoding (UTF8) for this simple task: > >>>> len("camión") > 7
CAREFUL HERE. You don't understand what is happening -- it is not as simple as you think it is. When you say len("camión"), you are writing that from a terminal (Gnome's Terminal, Sugar Terminal, xterm) that is set to use utf-8. However, Python expects the sequence between " characters to be straight ASCII (with a codepage). So your terminal IS sending to Python what looks like 7 chars -- definitely 7 bytes. However, there is an ASCII representation of "camión" that has 6 bytes, using the Latin-1 codepage. In fact, install an old Linux system, open an xterm or a VT, retry your example and you'll probably see that camión has 6 bytes. I agree we should all use Unicode, specifically UTF-8, everywhere. We should also make an effort to understand the mechanics of what is actually happening behind the scenes. cheers, m -- martin.langh...@gmail.com mar...@laptop.org -- Software Architect - OLPC - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff _______________________________________________ Sugar-devel mailing list Sugar-devel@lists.sugarlabs.org http://lists.sugarlabs.org/listinfo/sugar-devel