Re: [Sugar-devel] Unicode strings in translations

Martin Langhoff Wed, 15 Aug 2012 06:41:14 -0700

On Wed, Aug 15, 2012 at 9:20 AM, Manuel Kaufmann <humi...@gmail.com> wrote:
> Oh, it's OK. I agree with the result. Now, let's check what Python say
> if I use my default encoding (UTF8) for this simple task:
>
>>>> len("camión")
> 7


CAREFUL HERE. You don't understand what is happening -- it is not as
simple as you think it is.

When you say  len("camión"), you are writing that from a terminal
(Gnome's Terminal, Sugar Terminal, xterm) that is set to use utf-8.

However, Python expects the sequence between " characters to be
straight ASCII (with a codepage). So your terminal IS sending to
Python what looks like 7 chars -- definitely 7 bytes.

However, there is an ASCII representation of "camión" that has 6
bytes, using the Latin-1 codepage. In fact, install an old Linux
system, open an xterm or a VT, retry your example and you'll probably
see that camión has 6 bytes.

I agree we should all use Unicode, specifically UTF-8, everywhere. We
should also make an effort to understand the mechanics of what is
actually happening behind the scenes.

cheers,



m
--
 martin.langh...@gmail.com
 mar...@laptop.org -- Software Architect - OLPC
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
_______________________________________________
Sugar-devel mailing list
Sugar-devel@lists.sugarlabs.org
http://lists.sugarlabs.org/listinfo/sugar-devel

Re: [Sugar-devel] Unicode strings in translations

Reply via email to