Re: Least-lossy string.encode to us-ascii?

2012-09-15 Thread wxjmfauth
Le vendredi 14 septembre 2012 22:45:05 UTC+2, Terry Reedy a écrit : > On 9/14/2012 12:15 PM, wxjmfa...@gmail.com wrote: > > > > > PS Avoid Py3.3 :-) > > > > pps Start using 3.3 as soon as possible. It has Python's first fully > > portable non-buggy Unicode implementation. The second releas

Re: Least-lossy string.encode to us-ascii?

2012-09-14 Thread Terry Reedy
On 9/13/2012 10:09 PM, Mark Tolonen wrote: On Thursday, September 13, 2012 4:53:13 PM UTC-7, Tim Chase wrote: On 09/13/12 18:36, Terry Reedy wrote: 'keep as much information as possible' would mean an effectively lossless transliteration, which you could do with a dict. {: 'o', : 'c,' (or pic

Re: Least-lossy string.encode to us-ascii?

2012-09-14 Thread Terry Reedy
On 9/14/2012 12:15 PM, wxjmfa...@gmail.com wrote: PS Avoid Py3.3 :-) pps Start using 3.3 as soon as possible. It has Python's first fully portable non-buggy Unicode implementation. The second release candidate is already out. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/

Re: Least-lossy string.encode to us-ascii?

2012-09-14 Thread wxjmfauth
Le jeudi 13 septembre 2012 23:25:27 UTC+2, Tim Chase a écrit : > I've got a bunch of text in Portuguese and to transmit them, need to > > have them in us-ascii (7-bit). I'd like to keep as much information > > as possible, just stripping accents, cedillas, tildes, etc. So > > "serviço móvil" b

Re: [SOLVED] Least-lossy string.encode to us-ascii?

2012-09-14 Thread Vlastimil Brom
2012/9/14 Tim Chase : > On 09/13/12 16:44, Vlastimil Brom wrote: >> >>> import unicodedata >> >>> unicodedata.normalize("NFD", u"serviço móvil").encode("ascii", >> >>> "ignore").decode("ascii") >> u'servico movil' > > Works well for all the test-cases I threw at it. Thanks! > > -tkc > > Hi, I am

Re: Least-lossy string.encode to us-ascii?

2012-09-13 Thread Steven D'Aprano
On Thu, 13 Sep 2012 21:34:52 -0500, Tim Chase wrote: > On 09/13/12 21:09, Mark Tolonen wrote: >> On Thursday, September 13, 2012 4:53:13 PM UTC-7, Tim Chase wrote: >>> Vlastimil's solution kept the characters but stripped them of their >>> accents/tildes/cedillas/etc, doing just what I wanted, all

Re: Least-lossy string.encode to us-ascii?

2012-09-13 Thread Steven D'Aprano
On Thu, 13 Sep 2012 16:26:07 -0500, Tim Chase wrote: > I've got a bunch of text in Portuguese and to transmit them, need to > have them in us-ascii (7-bit). That could mean two things: 1) "The receiver is incapable of dealing with Unicode in 2012, which is frankly appalling, but what can I do a

Re: Least-lossy string.encode to us-ascii?

2012-09-13 Thread Tim Chase
On 09/13/12 21:09, Mark Tolonen wrote: > On Thursday, September 13, 2012 4:53:13 PM UTC-7, Tim Chase wrote: >> Vlastimil's solution kept the characters but stripped them of their >> accents/tildes/cedillas/etc, doing just what I wanted, all using the >> stdlib. Hard to do better than that :-) > >

Re: Least-lossy string.encode to us-ascii?

2012-09-13 Thread Mark Tolonen
On Thursday, September 13, 2012 4:53:13 PM UTC-7, Tim Chase wrote: > On 09/13/12 18:36, Terry Reedy wrote: > > > On 9/13/2012 5:26 PM, Tim Chase wrote: > > >> I've got a bunch of text in Portuguese and to transmit them, need to > > >> have them in us-ascii (7-bit). I'd like to keep as much info

Re: Least-lossy string.encode to us-ascii?

2012-09-13 Thread Tim Chase
On 09/13/12 18:36, Terry Reedy wrote: > On 9/13/2012 5:26 PM, Tim Chase wrote: >> I've got a bunch of text in Portuguese and to transmit them, need to >> have them in us-ascii (7-bit). I'd like to keep as much information >> as possible,just stripping accents, cedillas, tildes, etc. > > 'keep as

Re: Least-lossy string.encode to us-ascii?

2012-09-13 Thread Terry Reedy
On 9/13/2012 5:26 PM, Tim Chase wrote: I've got a bunch of text in Portuguese and to transmit them, need to have them in us-ascii (7-bit). I'd like to keep as much information as possible,just stripping accents, cedillas, tildes, etc. 'keep as much information as possible' would mean an effect

Re: Least-lossy string.encode to us-ascii?

2012-09-13 Thread Ethan Furman
[sorry for the direct reply, Tim] Tim Chase wrote: I've got a bunch of text in Portuguese and to transmit them, need to have them in us-ascii (7-bit). I'd like to keep as much information as possible, just stripping accents, cedillas, tildes, etc. So "serviço móvil" becomes "servico movil". I

Re: [SOLVED] Least-lossy string.encode to us-ascii?

2012-09-13 Thread Tim Chase
On 09/13/12 16:44, Vlastimil Brom wrote: > >>> import unicodedata > >>> unicodedata.normalize("NFD", u"serviço móvil").encode("ascii", > >>> "ignore").decode("ascii") > u'servico movil' Works well for all the test-cases I threw at it. Thanks! -tkc -- http://mail.python.org/mailman/listinfo/p

Re: Least-lossy string.encode to us-ascii?

2012-09-13 Thread Christian Heimes
Am 13.09.2012 23:26, schrieb Tim Chase: > I've got a bunch of text in Portuguese and to transmit them, need to > have them in us-ascii (7-bit). I'd like to keep as much information > as possible, just stripping accents, cedillas, tildes, etc. So > "serviço móvil" becomes "servico movil". Is ther

Re: Least-lossy string.encode to us-ascii?

2012-09-13 Thread Vlastimil Brom
2012/9/13 Tim Chase : > I've got a bunch of text in Portuguese and to transmit them, need to > have them in us-ascii (7-bit). I'd like to keep as much information > as possible, just stripping accents, cedillas, tildes, etc. So > "serviço móvil" becomes "servico movil". Is there anything stock >

Least-lossy string.encode to us-ascii?

2012-09-13 Thread Tim Chase
I've got a bunch of text in Portuguese and to transmit them, need to have them in us-ascii (7-bit). I'd like to keep as much information as possible, just stripping accents, cedillas, tildes, etc. So "serviço móvil" becomes "servico movil". Is there anything stock that I've missed? I can do mys