Re: utf - string translation

2006-11-29 Thread Frederic Rentsch
Dan wrote: On 22 nov, 22:59, John Machin [EMAIL PROTECTED] wrote: processes (Vigenère) So why do you want to strip off accents? The history of communication has several examples of significant difference in meaning caused by minute differences in punctuation or accents including

Re: utf - string translation

2006-11-29 Thread John Machin
Frederic Rentsch wrote: Try this: from_characters = '\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xff\xe7\xe8\xe9\xea\xeb'

Re: utf - string translation

2006-11-29 Thread Fredrik Lundh
John Machin wrote: 3. ... and to check for missing maps. The OP may be working only with French text, and may not care about Icelandic and German letters, but other readers who stumble on this (and miss past thread(s) on this topic) may like something done with \xde (capital thorn), \xfe

Re: utf - string translation

2006-11-29 Thread John Machin
Fredrik Lundh wrote: John Machin wrote: 3. ... and to check for missing maps. The OP may be working only with French text, and may not care about Icelandic and German letters, but other readers who stumble on this (and miss past thread(s) on this topic) may like something done with \xde

Re: utf - string translation

2006-11-29 Thread Fredrik Lundh
John Machin wrote: Another point: there are many non-latin1 characters that could be mapped to ASCII. For example: u\u0141ukasziewicz.translate(unaccented_map()) doesn't work unless an entry is added to the no-decomposition table: 0x0141: uL, # LATIN CAPITAL LETTER L WITH STROKE

Re: utf - string translation

2006-11-29 Thread John Machin
Fredrik Lundh wrote: John Machin wrote: Another point: there are many non-latin1 characters that could be mapped to ASCII. For example: u\u0141ukasziewicz.translate(unaccented_map()) doesn't work unless an entry is added to the no-decomposition table: 0x0141: uL, # LATIN

Re: utf - string translation

2006-11-26 Thread Dan
On 22 nov, 22:59, John Machin [EMAIL PROTECTED] wrote: processes (Vigenère) So why do you want to strip off accents? The history of communication has several examples of significant difference in meaning caused by minute differences in punctuation or accents including one of which you may

Re: utf - string translation

2006-11-23 Thread Fredrik Lundh
Klaas wrote: It's not too hard to imagine an accentual difference, eg: especially in languages where certain combinations really are distinct letters, not just letters with accents or silly marks. I have a Swedish children's book somewhere, in which some characters are harassed by a big ugly

Re: utf - string translation

2006-11-23 Thread Eric Brunel
On Wed, 22 Nov 2006 22:59:01 +0100, John Machin [EMAIL PROTECTED] wrote: [snip] So why do you want to strip off accents? The history of communication has several examples of significant difference in meaning caused by minute differences in punctuation or accents including one of which you

utf - string translation

2006-11-22 Thread hg
Hi, I'm bringing over a thread that's going on on f.c.l.python. The point was to get rid of french accents from words. We noticed that len('à') != len('a') and I found the hack below to fix the problem ... yet I do not understand - especially since 'à' is included in the extended ASCII table,

Re: utf - string translation

2006-11-22 Thread Fredrik Lundh
hg wrote: We noticed that len('à') != len('a') sounds odd. len('à') == len('a') True are you perhaps using an UTF-8 editor? to keep your sanity, no matter what editor you're using, I recommend adding a coding directive to the source file, and using *only* Unicode string literals for

Re: utf - string translation

2006-11-22 Thread hg
Fredrik Lundh wrote: hg wrote: We noticed that len('à') != len('a') sounds odd. len('à') == len('a') True are you perhaps using an UTF-8 editor? to keep your sanity, no matter what editor you're using, I recommend adding a coding directive to the source file, and using *only*

Re: utf - string translation

2006-11-22 Thread hg
hg wrote: Fredrik Lundh wrote: hg wrote: We noticed that len('à') != len('a') sounds odd. len('à') == len('a') True are you perhaps using an UTF-8 editor? to keep your sanity, no matter what editor you're using, I recommend adding a coding directive to the source file, and using

Re: utf - string translation

2006-11-22 Thread Duncan Booth
hg [EMAIL PROTECTED] wrote: or in other words, put this at the top of your file (where utf-8 is whatever your editor/system is using): # -*- coding: utf-8 -*- and use u'text' for all non-ASCII literals. /F Hi, The problem is that: # -*- coding: utf-8 -*- import

Re: utf - string translation

2006-11-22 Thread hg
Duncan Booth wrote: hg [EMAIL PROTECTED] wrote: or in other words, put this at the top of your file (where utf-8 is whatever your editor/system is using): # -*- coding: utf-8 -*- and use u'text' for all non-ASCII literals. /F Hi, The problem is that: # -*- coding: utf-8

Re: utf - string translation

2006-11-22 Thread Fredrik Lundh
hg wrote: How would you handle the string.maketrans then ? maketrans works on bytes, not characters. what makes you think that you can use maketrans if you haven't gotten the slightest idea what's in the string? if you want to get rid of accents in a Unicode string, you can do the

Re: utf - string translation

2006-11-22 Thread hg
Fredrik Lundh wrote: hg wrote: How would you handle the string.maketrans then ? maketrans works on bytes, not characters. what makes you think that you can use maketrans if you haven't gotten the slightest idea what's in the string? if you want to get rid of accents in a Unicode

Re: utf - string translation

2006-11-22 Thread John Machin
hg wrote: Duncan Booth wrote: hg [EMAIL PROTECTED] wrote: or in other words, put this at the top of your file (where utf-8 is whatever your editor/system is using): # -*- coding: utf-8 -*- and use u'text' for all non-ASCII literals. /F Hi, The problem is

Re: utf - string translation

2006-11-22 Thread Dan
Thank you for your answers. In fact, I'm getting start with Python. I was looking for transform a text through elementary cryptographic processes (Vigenère). The initial text is in a file, and my system is under UTF-8 by default (Ubuntu) -- http://mail.python.org/mailman/listinfo/python-list

Re: utf - string translation

2006-11-22 Thread John Machin
Dan wrote: Thank you for your answers. In fact, I'm getting start with Python. That was a good decision. Welcome! I was looking for transform a text through elementary cryptographic processes (Vigenère). So why do you want to strip off accents? The history of communication has several

Re: utf - string translation

2006-11-22 Thread David H Wild
In article [EMAIL PROTECTED], John Machin [EMAIL PROTECTED] wrote: So why do you want to strip off accents? The history of communication has several examples of significant difference in meaning caused by minute differences in punctuation or accents including one of which you may have

Re: utf - string translation

2006-11-22 Thread John Machin
David H Wild wrote: In article [EMAIL PROTECTED], John Machin [EMAIL PROTECTED] wrote: So why do you want to strip off accents? The history of communication has several examples of significant difference in meaning caused by minute differences in punctuation or accents including one of

Re: utf - string translation

2006-11-22 Thread Klaas
David H Wild wrote: In article [EMAIL PROTECTED], John Machin [EMAIL PROTECTED] wrote: So why do you want to strip off accents? The history of communication has several examples of significant difference in meaning caused by minute differences in punctuation or accents including one of