Re: [Tutor] Unicode to Ascii

2016-09-26 Thread Steven D'Aprano
On Tue, Sep 27, 2016 at 03:46:25AM +0530, srinivas devaki wrote: > How can I convert Unicode to Ascii by stripping of any non ascii characters. > > one way is to filter on s like > > ascii = ''.join(filter(lambda x: 0 <= ord(x) < 256, unicode_string)) > > but are there any other simple ways ? S

Re: [Tutor] unicode decode/encode issue

2016-09-26 Thread Steven D'Aprano
I'm sorry, I have misinterpreted your question. On Mon, Sep 26, 2016 at 12:59:04PM -0400, bruce wrote: > I've got a page from a web fetch. I'm simply trying to go from utf-8 to > ascii. Why would you do that? It's 2016, not 1953, and ASCII is well and truly obsolete. (ASCII was even obsolete i

[Tutor] Unicode to Ascii

2016-09-26 Thread srinivas devaki
How can I convert Unicode to Ascii by stripping of any non ascii characters. one way is to filter on s like ascii = ''.join(filter(lambda x: 0 <= ord(x) < 256, unicode_string)) but are there any other simple ways ? Regards Srinivas Devaki Senior (final yr) student at Indian Institute of Technol

Re: [Tutor] unicode decode/encode issue

2016-09-26 Thread bruce
Hey folks. (peter!) Thanks for the reply. I wound up doing: #s=s.replace('\u2013', '-') #s=s.replace(u'\u2013', '-') #s=s.replace(u"\u2013", "-") #s=re.sub(u"\u2013", "-", s) s=s.encode("ascii", "ignore") s=s.replace(u"\u2013", "-") s=s.replace("–", "-") ##<<< this was actually in

Re: [Tutor] unicode decode/encode issue

2016-09-26 Thread Peter Otten
bruce wrote: > Hi. > > Ive got a "basic" situation that should be simpl. So it must be a user > (me) issue! > > > I've got a page from a web fetch. I'm simply trying to go from utf-8 to > ascii. I'm not worried about any cruft that might get stripped out as the > data is generated from a us sit

Re: [Tutor] unicode decode/encode issue

2016-09-26 Thread Steven D'Aprano
On Mon, Sep 26, 2016 at 12:59:04PM -0400, bruce wrote: > When I look at the input content, I have : > > u'English 120 Course Syllabus \u2013 Fall \u2013 2006' > > So, any pointers on replacing the \u2013 with a simple '-' (dash) (or I > could even handle just a ' ' (space) You misinterpret wha

[Tutor] unicode decode/encode issue

2016-09-26 Thread bruce
Hi. Ive got a "basic" situation that should be simpl. So it must be a user (me) issue! I've got a page from a web fetch. I'm simply trying to go from utf-8 to ascii. I'm not worried about any cruft that might get stripped out as the data is generated from a us site. (It's a college/class dataset

Re: [Tutor] Unicode encoding and raw_input() in Python 2.7 ?

2015-04-17 Thread Dave Angel
On 04/17/2015 04:39 AM, Samuel VISCAPI wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi, This is my first post to that mailing list if I remember correctly, so hello everyone ! Welcome to the list. I've been stuck on a simple problem for the past few hours. I'd just like raw_input

Re: [Tutor] Unicode encoding and raw_input() in Python 2.7 ?

2015-04-17 Thread Alan Gauld
On 17/04/15 09:39, Samuel VISCAPI wrote: hello everyone ! Hello, and welcome. I've been stuck on a simple problem for the past few hours. I'd just like raw_input to work with accentuated characters. For example: firstname = str.capitalize(raw_input('First name: ')) where firstname could be

[Tutor] Unicode encoding and raw_input() in Python 2.7 ?

2015-04-17 Thread Samuel VISCAPI
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi, This is my first post to that mailing list if I remember correctly, so hello everyone ! I've been stuck on a simple problem for the past few hours. I'd just like raw_input to work with accentuated characters. For example: firstname = str.capi

Re: [Tutor] Unicode Encode Error

2014-06-17 Thread Peter Otten
Aaron Misquith wrote: > I'm trying to obtain the questions present in StackOverflow for a > particular tag. > > Whenever I try to run the program i get this *error:* > > Message File Name Line Position > Traceback > C:\Users\Aaron\Desktop\question.py 20 > UnicodeEncodeError: 'ascii' codec c

[Tutor] Unicode Encode Error

2014-06-17 Thread Aaron Misquith
I'm trying to obtain the questions present in StackOverflow for a particular tag. Whenever I try to run the program i get this *error:* Message File Name Line Position Traceback C:\Users\Aaron\Desktop\question.py 20 UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in positi

Re: [Tutor] unicode: alpha, whitespaces and digits

2013-12-29 Thread eryksun
On Sun, Dec 29, 2013 at 5:58 PM, Steven D'Aprano wrote: > If you want to test for something that a human reader will recognise as > a "whole number", s.isdigit() is probably the best one to use. isdigit() includes decimal digits plus other characters that have a digit value: >>> print u'\N{s

Re: [Tutor] unicode: alpha, whitespaces and digits

2013-12-29 Thread Steven D'Aprano
On Mon, Dec 30, 2013 at 09:58:10AM +1100, Steven D'Aprano wrote: > What gives you that impression? isspace works on Unicode strings too. > > py> ' x'.isspace() > False > py> ''.isspace() > True Oops, the above was copied and pasted from Python 3, which is why there are no u' prefixes. But

Re: [Tutor] unicode: alpha, whitespaces and digits

2013-12-29 Thread Steven D'Aprano
On Sun, Dec 29, 2013 at 02:36:32PM +0100, Ulrich Goebel wrote: > Hallo, > > I have a unicode string s, for example u"abc", u"äöü", u"123" or > something else, and I have to find out wether > > 1. s is not empty and contains only digits (as in u"123", but not in > u"3.1415") > > or > > 2. s is

Re: [Tutor] unicode: alpha, whitespaces and digits

2013-12-29 Thread Dave Angel
On Sun, 29 Dec 2013 19:20:04 +, Mark Lawrence wrote: > 2. s is empty or contains only whitespaces Call strip() on it. If it's now empty, it was whitespace. -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription o

Re: [Tutor] unicode: alpha, whitespaces and digits

2013-12-29 Thread Mark Lawrence
On 29/12/2013 13:36, Ulrich Goebel wrote: Hallo, I have a unicode string s, for example u"abc", u"äöü", u"123" or something else, and I have to find out wether 1. s is not empty and contains only digits (as in u"123", but not in u"3.1415") or 2. s is empty or contains only whitespaces For al

[Tutor] unicode: alpha, whitespaces and digits

2013-12-29 Thread Ulrich Goebel
Hallo, I have a unicode string s, for example u"abc", u"äöü", u"123" or something else, and I have to find out wether 1. s is not empty and contains only digits (as in u"123", but not in u"3.1415") or 2. s is empty or contains only whitespaces For all other cases I would assume a "normal"

Re: [Tutor] unicode help

2012-11-14 Thread Sander Sweers
Marilyn Davis schreef op wo 14-11-2012 om 13:23 [-0800]: > I found this site: > http://hints.macworld.com/article.php?story=20100713130450549 > > and that fixes it. Short answer: It is not a fix but a workaround. Try: print symbol.encode('utf-8') Longer answer: It is not really a fix, it is a wo

Re: [Tutor] unicode help

2012-11-14 Thread Dave Angel
On 11/14/2012 04:07 PM, Marilyn Davis wrote: > > > Goodness! I didn't expect it to be a Mac thing. > > So, on a Windows machine, running Python 2.6.6, sys.stdout.encoding is > 'cp1252', yet the code runs fine. > > On Ubuntu with 2.7, it's 'UTF-8' and it runs just fine. > > I find this most myste

Re: [Tutor] unicode help

2012-11-14 Thread Marilyn Davis
On Wed, November 14, 2012 1:07 pm, Marilyn Davis wrote: > Thank you, Dave, for looking at my problem, and for correcting me on my > top posting. > > See below: > > > On Wed, November 14, 2012 12:34 pm, Dave Angel wrote: > > >> On 11/14/2012 03:10 PM, Marilyn Davis wrote: >> >> >>> Hi, >>> >>> >>>

Re: [Tutor] unicode help

2012-11-14 Thread Marilyn Davis
Thank you, Dave, for looking at my problem, and for correcting me on my top posting. See below: On Wed, November 14, 2012 12:34 pm, Dave Angel wrote: > On 11/14/2012 03:10 PM, Marilyn Davis wrote: > >> Hi, >> >> >> Last year, I was helped so that this ran nicely on my 2.6: >> >> >> #! /usr/bin/e

Re: [Tutor] unicode help

2012-11-14 Thread Dave Angel
On 11/14/2012 03:10 PM, Marilyn Davis wrote: > Hi, > > Last year, I was helped so that this ran nicely on my 2.6: > > #! /usr/bin/env python > # -*- coding: utf-8 -*- > # necessary for python not to complain about "¥" > > symbol = unichr(165) > print unicode(symbol) > > --- end of code --- > > But,

Re: [Tutor] unicode help

2012-11-14 Thread Marilyn Davis
Hi, Last year, I was helped so that this ran nicely on my 2.6: #! /usr/bin/env python # -*- coding: utf-8 -*- # necessary for python not to complain about "¥" symbol = unichr(165) print unicode(symbol) --- end of code --- But, now on my 2.7, and on 2.6 when I tried reinstalling it, I get: bas

Re: [Tutor] Unicode? UTF-8? UTF-16? WTF-8? ;)

2012-09-05 Thread Mark Lawrence
On 05/09/2012 16:18, eryksun wrote: On Wed, Sep 5, 2012 at 10:51 AM, Ray Jones wrote: subprocess.call(['dolphin', '/my_home/testdir/\u044c\u043e\u0432']) Dolphin's error message: 'The file or folder /my_home/testdir/\u044c\u043e\u0432 does not exist' "\u" only codes a BMP character in unico

Re: [Tutor] Unicode? UTF-8? UTF-16? WTF-8? ;)

2012-09-05 Thread Ray Jones
On 09/05/2012 08:18 AM, eryksun wrote: > On Wed, Sep 5, 2012 at 10:51 AM, Ray Jones wrote: >> subprocess.call(['dolphin', '/my_home/testdir/\u044c\u043e\u0432']) >> >> Dolphin's error message: 'The file or folder >> /my_home/testdir/\u044c\u043e\u0432 does not exist' > "\u" only codes a BMP charac

Re: [Tutor] Unicode? UTF-8? UTF-16? WTF-8? ;)

2012-09-05 Thread Steven D'Aprano
On 06/09/12 00:51, Ray Jones wrote: subprocess.call(['dolphin', '/my_home/testdir/\u044c\u043e\u0432']) Dolphin's error message: 'The file or folder /my_home/testdir/\u044c\u043e\u0432 does not exist' That's because you're telling Dolphin to look for a file literally called: BACKSLASH u ZERO

Re: [Tutor] Unicode? UTF-8? UTF-16? WTF-8? ;)

2012-09-05 Thread Steven D'Aprano
On 06/09/12 00:04, Ray Jones wrote: On 09/05/2012 04:52 AM, Peter Otten wrote: Ray Jones wrote: But doesn't that entail knowing in advance which encoding you will be working with? How would you automate the process while reading existing files? If you don't *know* the encoding you *have* to

Re: [Tutor] Unicode? UTF-8? UTF-16? WTF-8? ;)

2012-09-05 Thread eryksun
On Wed, Sep 5, 2012 at 10:51 AM, Ray Jones wrote: > > subprocess.call(['dolphin', '/my_home/testdir/\u044c\u043e\u0432']) > > Dolphin's error message: 'The file or folder > /my_home/testdir/\u044c\u043e\u0432 does not exist' "\u" only codes a BMP character in unicode literals, i.e. u"unicode lite

Re: [Tutor] Unicode? UTF-8? UTF-16? WTF-8? ;)

2012-09-05 Thread Peter Otten
Ray Jones wrote: > On 09/05/2012 04:52 AM, Peter Otten wrote: >> Ray Jones wrote: >> >>> >>> But doesn't that entail knowing in advance which encoding you will be >>> working with? How would you automate the process while reading existing >>> files? >> If you don't *know* the encoding you *have* t

Re: [Tutor] Unicode? UTF-8? UTF-16? WTF-8? ;)

2012-09-05 Thread Ray Jones
On 09/05/2012 07:51 AM, Ray Jones wrote: > subprocess.call(['dolphin', '/my_home/testdir/\u044c\u043e\u0432']) > > Dolphin's error message: 'The file or folder > /my_home/testdir/\u044c\u043e\u0432 does not exist' > > But if I copy the characters as seen by Bash's shell and paste them into > my sub

Re: [Tutor] Unicode? UTF-8? UTF-16? WTF-8? ;)

2012-09-05 Thread Ray Jones
On 09/05/2012 07:31 AM, eryksun wrote: > On Wed, Sep 5, 2012 at 5:42 AM, Ray Jones wrote: >> I have directory names that contain Russian characters, Romanian >> characters, French characters, et al. When I search for a file using >> glob.glob(), I end up with stuff like \x93\x8c\xd1 in place of th

Re: [Tutor] Unicode? UTF-8? UTF-16? WTF-8? ;)

2012-09-05 Thread eryksun
On Wed, Sep 5, 2012 at 5:42 AM, Ray Jones wrote: > I have directory names that contain Russian characters, Romanian > characters, French characters, et al. When I search for a file using > glob.glob(), I end up with stuff like \x93\x8c\xd1 in place of the > directory names. I thought simply identi

Re: [Tutor] Unicode? UTF-8? UTF-16? WTF-8? ;)

2012-09-05 Thread Ray Jones
On 09/05/2012 04:52 AM, Peter Otten wrote: > Ray Jones wrote: > >> >> But doesn't that entail knowing in advance which encoding you will be >> working with? How would you automate the process while reading existing >> files? > If you don't *know* the encoding you *have* to guess. For instance you c

Re: [Tutor] Unicode? UTF-8? UTF-16? WTF-8? ;)

2012-09-05 Thread Peter Otten
Ray Jones wrote: >> You can work around that by specifying the appropriate encoding >> explicitly: >> >> $ python tmp2.py iso-8859-5 | cat >> � >> $ python tmp2.py latin1 | cat >> Traceback (most recent call last): >>File "tmp2.py", line 4, in >>print u"Я".encode(encoding) >> UnicodeEncodeError:

Re: [Tutor] Unicode? UTF-8? UTF-16? WTF-8? ;)

2012-09-05 Thread Ray Jones
On 09/05/2012 03:33 AM, Peter Otten wrote: > Ray Jones wrote: > >> I have directory names that contain Russian characters, Romanian >> characters, French characters, et al. When I search for a file using >> glob.glob(), I end up with stuff like \x93\x8c\xd1 in place of the >> directory names. I tho

Re: [Tutor] Unicode? UTF-8? UTF-16? WTF-8? ;)

2012-09-05 Thread Peter Otten
Ray Jones wrote: > I have directory names that contain Russian characters, Romanian > characters, French characters, et al. When I search for a file using > glob.glob(), I end up with stuff like \x93\x8c\xd1 in place of the > directory names. I thought simply identifying them as Unicode would > cl

Re: [Tutor] Unicode? UTF-8? UTF-16? WTF-8? ;)

2012-09-05 Thread Ray Jones
On 09/05/2012 02:57 AM, Walter Prins wrote: > Hi Ray, > > On 5 September 2012 10:42, Ray Jones wrote: >> Can someone point me to a page that will clarify the concepts, not just >> try to show me the Python implementation of what I already don't >> understand? ;) > Try the following 2 links which s

Re: [Tutor] Unicode? UTF-8? UTF-16? WTF-8? ;)

2012-09-05 Thread Walter Prins
Hi Ray, On 5 September 2012 10:42, Ray Jones wrote: > Can someone point me to a page that will clarify the concepts, not just > try to show me the Python implementation of what I already don't > understand? ;) Try the following 2 links which should hopefully help: http://www.joelonsoftware.com/

[Tutor] Unicode? UTF-8? UTF-16? WTF-8? ;)

2012-09-05 Thread Ray Jones
I have directory names that contain Russian characters, Romanian characters, French characters, et al. When I search for a file using glob.glob(), I end up with stuff like \x93\x8c\xd1 in place of the directory names. I thought simply identifying them as Unicode would clear that up. Nope. Now I hav

Re: [Tutor] unicode help

2011-05-28 Thread Marilyn Davis
Aye, thank you. I do like that syntax better. Sometimes it's time to just quit and try again later when I'm not so frustrated. That's when I make silly bugs. But, we got it! Thank you again. I think it's a nifty hack. M On Sat, May 28, 2011 3:53 pm, Alexandre Conrad wrote: > Marilyn, > >

Re: [Tutor] unicode help

2011-05-28 Thread Alexandre Conrad
Marilyn, You miss-typed the line, it should have a semicolon right after the word "coding", such as: # coding: utf-8 not # coding utf-8 as you showed from your file. The syntax suggested syntax # -*- coding: utf8 -*- by Martin is equivalent, but I always have a hard time remembering it from t

Re: [Tutor] unicode help

2011-05-28 Thread Marilyn Davis
Thank you Martin, This: #!/usr/bin/env python # -*- coding: utf8 -*- '''Unicode handling for 2.6. ''' [rest of module deleted] produces an emacs warning: Warning (mule): Invalid coding system `utf8' is specified for the current buffer/file by the :coding tag. It is highly recommended to fix it

Re: [Tutor] unicode help

2011-05-28 Thread Marilyn Davis
Thank you Alexandre for your quick reply. I tried your suggestion (again) and I still get: ./uni.py File "./uni.py", line 20 SyntaxError: Non-ASCII character '\xa5' in file ./uni.py on line 21, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Can you suggest a

Re: [Tutor] unicode help

2011-05-28 Thread Martin A. Brown
Hello there, : I'm still on Python 2.6 and I'm trying to work some unicode : handling. : : I've spent some hours on this snippet of code, trying to follow : PEP 0263, since the error tells me to see it. I've tried other : docs too and I am still clueless. OK, so this is PEP 0263. htt

Re: [Tutor] unicode help

2011-05-28 Thread Alexandre Conrad
When Python loads your file from your file system, it assumes all characters in the file are ASCII. But when it hits non-ASCII characters (currency symbols), Python doesn't know how to interpret it. So you can give Python a hint by putting at the top of your file the encoding of your file: After t

[Tutor] unicode help

2011-05-28 Thread Marilyn Davis
Hi, I'm still on Python 2.6 and I'm trying to work some unicode handling. I've spent some hours on this snippet of code, trying to follow PEP 0263, since the error tells me to see it. I've tried other docs too and I am still clueless. The code works, except for the comment at the end. I would

Re: [Tutor] unicode nightmare

2010-11-10 Thread Alan Gauld
"danielle davout" wrote I simplify it to v = u'\u0eb4' X = (1,) gen = ((v ,v) for x in X for y in X) What can be so wrong in this line, around it to give the 1lined file ໄ:ໄ where ໄ "is" not u'\u0eb4' but u'\u0ec4' though a direct printing looks OK The code will produce a one line file wi

[Tutor] unicode nightmare

2010-11-10 Thread danielle davout
Hi, I really badly need any start of explanation .. Please help me ! I have got a list of 64 generators from which I generate files. 58 give what I expected 1 is getting me mad, not the first, not the last the fifth... I simplify it to v = u'\u0eb4' X = (1,) gen = ((v ,v) for x in X for y in X)

Re: [Tutor] unicode ordinals to/from utf8

2009-12-26 Thread Kent Johnson
On Sat, Dec 26, 2009 at 10:41 AM, spir wrote: > OK, I'll answer myself ;-) > Found needed information at http://www1.tip.nl/~t876506/utf8tbl.html > See below new version, I'm not at all sure what you are trying to do here. Is it more than conversion between unicode and utf-8? It looks like you ha

Re: [Tutor] unicode ordinals to/from utf8

2009-12-26 Thread spir
OK, I'll answer myself ;-) Found needed information at http://www1.tip.nl/~t876506/utf8tbl.html See below new version, Denis la vita e estrany http://spir.wikidot.com/ = # coding: utf8 import sys ; end = sys.exit # constant max_co

[Tutor] unicode ordinals to/from utf8

2009-12-25 Thread spir
Special offer for coders coding on Christmas day! I'm looking for the simplest way to decode/encode unicode ordinals (called 'codes' below) to/from utf8. Find this rather tricky, esp because of variable number of meaningful bits in first octet. Specifically, for encoding, is there a way to avoi

Re: [Tutor] unicode mapping doesn't work

2009-11-27 Thread spir
Lie Ryan wrote: > > funnychars = u"éèêëóòôöáàâäÉÈÊËÓÒÔÖÁÀÂÄ" > > asciichars = "" > > In addition to Lie's reply, you will very probably need diacritic-free chars to be unicode, too. Otherwise prepare for later UnocideEn/De-codeError-s. As a rule of thumb, if you work wi

Re: [Tutor] unicode mapping doesn't work

2009-11-27 Thread Lie Ryan
On 11/27/2009 12:06 PM, Alan Gauld wrote: Huh?! Was this to the right place? It doesn't seem to be related to the previous posts in the thread? Confused Alan G. whoops.. wrong thread... ___ Tutor maillist - Tutor@python.org To unsubscribe o

Re: [Tutor] unicode mapping doesn't work

2009-11-26 Thread Alan Gauld
Huh?! Was this to the right place? It doesn't seem to be related to the previous posts in the thread? Confused Alan G. "Lie Ryan" wrote in message news:hen7am$4r...@ger.gmane.org... On 11/27/2009 10:43 AM, The Music Guy wrote: > Next thing is, I can't see logically how the path of the di

Re: [Tutor] unicode mapping doesn't work

2009-11-26 Thread Lie Ryan
On 11/27/2009 10:43 AM, The Music Guy wrote: > Next thing is, I can't see logically how the path of the discussion of > the proposal lead to the proposal being rejected. It looked like a lot > of people really liked the idea--including Guido himself--and several > examples were given about how it

Re: [Tutor] unicode mapping doesn't work

2009-11-26 Thread Albert-Jan Roskam
he face of ambiguity, refuse the temptation to guess. ~~ --- On Thu, 11/26/09, Lie Ryan wrote: From: Lie Ryan Subject: Re: [Tutor] unicode mapping doesn't work To: tutor@python.org Date: Thursday, November 26, 2009, 5:33 PM Al

Re: [Tutor] unicode mapping doesn't work

2009-11-26 Thread Lie Ryan
Albert-Jan Roskam wrote: Hi, I want to substitute some letters with accents with theire non-accented equivalents. It should be easy, but it doesn't work. What am I doing wrong? trans = {} funnychars = u"éèêëóòôöáàâäÉÈÊËÓÒÔÖÁÀÂÄ" asciichars = "" for f, a in zip(funnycha

[Tutor] unicode mapping doesn't work

2009-11-26 Thread Albert-Jan Roskam
Hi, I want to substitute some letters with accents with theire non-accented equivalents. It should be easy, but it doesn't work. What am I doing wrong? trans = {} funnychars = u"éèêëóòôöáàâäÉÈÊËÓÒÔÖÁÀÂÄ" asciichars = "" for f, a in zip(funnychars, asciichars):     trans.u

Re: [Tutor] unicode: % & __str__ & str()

2009-10-30 Thread Dave Angel
spir wrote: [back to the list after a rather long break] Hello, I stepped on a unicode issue ;-) (one more) Below an illustration: class U(unicode): def __str__(self): return self # if you can't properly see the string below, # 128 ===

[Tutor] unicode: % & __str__ & str()

2009-10-30 Thread spir
[back to the list after a rather long break] Hello, I stepped on a unicode issue ;-) (one more) Below an illustration: === class U(unicode): def __str__(self): return self # if you can't properly see the string below, # 128 ¶ÿµ ¶ÿµ ¶ÿµ ¶ÿµ ¶ÿ

Re: [Tutor] unicode, utf-8 problem again

2009-06-04 Thread Mark Tolonen
"Dinesh B Vadhia" wrote in message news:col103-ds25bb23a18e216061c32eb1a3...@phx.gbl... Hi! I'm processing a large number of xml files that are all declared as utf-8 encoded in the header ie. My Python environment has been set for 'utf-8' through site.py. It's a bad idea to change th

Re: [Tutor] unicode, utf-8 problem again

2009-06-04 Thread Dinesh B Vadhia
That was very useful - thanks! Hopefully, I'm "all Unicode" now. From: wesley chun Sent: Thursday, June 04, 2009 10:45 AM To: Dinesh B Vadhia ; tutor@python.org Subject: Re: [Tutor] unicode, utf-8 problem again >> But, I still get this error: >> Trace

Re: [Tutor] unicode, utf-8 problem again

2009-06-04 Thread wesley chun
>>  But, I still get this error: >>  Traceback (most recent call last): >> ... >> UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in >> position 76: ordinal not in range(128) >>  What am I missing? > > Take a read through http://evanjones.ca/python-utf8.html which will give you >

Re: [Tutor] unicode, utf-8 problem again

2009-06-04 Thread Dinesh B Vadhia
Okay, I get it now ... reading/writing files with the codecs module and the 'utf-8' option fixes it. Thanks! From: Christian Witts Sent: Thursday, June 04, 2009 7:05 AM To: Dinesh B Vadhia Cc: tutor@python.org Subject: Re: [Tutor] unicode, utf-8 problem again Dinesh B Va

Re: [Tutor] unicode, utf-8 problem again

2009-06-04 Thread Christian Witts
Dinesh B Vadhia wrote: Hi! I'm processing a large number of xml files that are all declared as utf-8 encoded in the header ie. My Python environment has been set for 'utf-8' through site.py. Additionally, the top of each program/module has the declaration: # -*- coding: utf-8 -*- But,

[Tutor] unicode, utf-8 problem again

2009-06-04 Thread Dinesh B Vadhia
Hi! I'm processing a large number of xml files that are all declared as utf-8 encoded in the header ie. My Python environment has been set for 'utf-8' through site.py. Additionally, the top of each program/module has the declaration: # -*- coding: utf-8 -*- But, I still get this error: Tr

Re: [Tutor] unicode to plain text conversion

2009-04-07 Thread Pirritano, Matthew
unces+mpirritano=ochca@python.org [mailto:tutor-bounces+mpirritano=ochca@python.org] On Behalf Of Kent Johnson Sent: Monday, April 06, 2009 5:51 PM To: Pirritano, Matthew Cc: Python Tutor Subject: Re: [Tutor] unicode to plain text conversion On Mon, Apr 6, 2009 at 6:48 PM, Pirritano, Matthew

Re: [Tutor] unicode to plain text conversion

2009-04-07 Thread Kent Johnson
On Tue, Apr 7, 2009 at 10:44 AM, Pirritano, Matthew wrote: > How can I find out the type of coding that was used to create this file? > Is there a way to do this other than just asking the person who created > it? That is possible, but I was just curious. If you can look at the data as hex value

Re: [Tutor] unicode to plain text conversion

2009-04-07 Thread Pirritano, Matthew
hca@python.org [mailto:tutor-bounces+mpirritano=ochca@python.org] On Behalf Of Alan Gauld Sent: Tuesday, April 07, 2009 1:42 AM To: tutor@python.org Subject: Re: [Tutor] unicode to plain text conversion "Pirritano, Matthew" wrote > I am a total newbie. I have a very large file >

Re: [Tutor] unicode to plain text conversion

2009-04-07 Thread Alan Gauld
"Pirritano, Matthew" wrote I am a total newbie. I have a very large file > 4GB that I need to convert from Unicode to plain text. I used to just use dos when the file was < 4GB but it no longer seems to work. Can anyone point me to some python code that might perform this function? When you s

Re: [Tutor] unicode to plain text conversion

2009-04-06 Thread Kent Johnson
On Mon, Apr 6, 2009 at 6:48 PM, Pirritano, Matthew wrote: > Hello python people, > > I am a total newbie. I have a very large file > 4GB that I need to > convert from Unicode to plain text. I used to just use dos when the file > was < 4GB but it no longer seems to work. Can anyone point me to some

Re: [Tutor] unicode to plain text conversion

2009-04-06 Thread wesley chun
> Previously I was able to convert just by using: > Type Unicode_filename.txt > new_text_file.txt > That's it. wow, if that's all you had to do, i'm not sure it's worthwhile to learning a new programming language just to process it with an application when your original solution was so dead simpl

Re: [Tutor] unicode to plain text conversion

2009-04-06 Thread Pirritano, Matthew
ilto:wes...@gmail.com] Sent: Monday, April 06, 2009 4:40 PM To: Pirritano, Matthew Cc: Python Tutor Subject: Re: [Tutor] unicode to plain text conversion > I am a total newbie. I have a very large file > 4GB that I need to > convert from Unicode to plain text. I used to just use dos when the

Re: [Tutor] unicode to plain text conversion

2009-04-06 Thread wesley chun
> I am a total newbie. I have a very large file > 4GB that I need to > convert from Unicode to plain text. I used to just use dos when the file > was < 4GB but it no longer seems to work. Can anyone point me to some > python code that might perform this function? can you elaborate on your convers

[Tutor] unicode to plain text conversion

2009-04-06 Thread Pirritano, Matthew
Hello python people, I am a total newbie. I have a very large file > 4GB that I need to convert from Unicode to plain text. I used to just use dos when the file was < 4GB but it no longer seems to work. Can anyone point me to some python code that might perform this function? Thanks Matt Matthew

Re: [Tutor] Unicode strings

2008-08-22 Thread Kent Johnson
On Fri, Aug 22, 2008 at 2:23 PM, eShopping <[EMAIL PROTECTED]> wrote: > Hi > > I am trying to read in non-ASCII data from file using Unicode, with this > test app: > > vocab=[("abends","in the evening"), > ("die Auff\xFCrung","performance (of a play)"), > ("der Au\xDFenhandel","foreign trade") The

[Tutor] Unicode strings

2008-08-22 Thread eShopping
Hi I am trying to read in non-ASCII data from file using Unicode, with this test app: vocab=[("abends","in the evening"), ("aber","but"), ("die abflughalle","departure lounge"), ("abhauen","to beat it/leave"), ("abholen","to collect/pick up"), ("das Abitur","A-levels"), ("abmachen","to take of

[Tutor] Unicode and fonts in Idle and Tkinter

2008-06-12 Thread Emmanuel Ruellan
Hi all, I'd like to display symbols from the international phonetic alphabet (IPA) in a Tkinter window, but I do not manage to have them displayed correctly, neither in a Tkinter window nor in Idle. For example, I'd like to print the following string and get the result below: *string* my_text =

Re: [Tutor] unicode problem

2007-09-23 Thread Emad Nawfal
Hi Tutors, I've just realized that i forgot to thank Kent Johnson for his advise on Unicode. Thank you kent. Best, Emad On 9/18/07, Kent Johnson <[EMAIL PROTECTED]> wrote: > > Emad Nawfal wrote: > > *Hi All Tutors,* > > *I'm new and I'm trying to use unicode strings in my code (specifically > > A

Re: [Tutor] unicode problem

2007-09-18 Thread Kent Johnson
Emad Nawfal wrote: > *Hi All Tutors,* > *I'm new and I'm trying to use unicode strings in my code (specifically > Arabic), but I get this:* > > IDLE 1.2.1 text = ur'المصريون' > Unsupported characters in input This seems to be a problem with IDLE rather than Python itself. This message: http

[Tutor] unicode problem

2007-09-18 Thread Emad Nawfal
*Hi All Tutors,* *I'm new and I'm trying to use unicode strings in my code (specifically Arabic), but I get this:* IDLE 1.2.1 >>> text = ur'المصريون' Unsupported characters in input >>> for letter in text: print letter Traceback (most recent call last): File "", line 1, in for letter i

Re: [Tutor] Unicode question

2007-09-12 Thread János Juhász
Dear Kent, thanks for your respond. It is clear now. > As a mnemonic I think of Unicode as pure unencoded data. (This is *not* > accurate, it is a memory aid!) Then it's easy to remember that decode() > removes encoding == convert to Unicode, encode() adds encoding == > convert from Unicode. So

Re: [Tutor] Unicode question

2007-09-11 Thread Kent Johnson
János Juhász wrote: > Dear All, > > I would like to convert my DOS txt file into pdf with reportlab. > The file can be seen correctly in Central European (DOS) encoding in > Explorer. > > My winxp uses cp852 as default codepage. > > When I open the txt file in notepad and set OEM/DOS script for

[Tutor] Unicode question

2007-09-11 Thread János Juhász
Dear All, I would like to convert my DOS txt file into pdf with reportlab. The file can be seen correctly in Central European (DOS) encoding in Explorer. My winxp uses cp852 as default codepage. When I open the txt file in notepad and set OEM/DOS script for terminal fonts, it shows the file co

Re: [Tutor] unicode encoding hell

2007-09-06 Thread Kent Johnson
David Bear wrote: > feedp.entry.title.decode('utf-8', 'xmlcharrefreplace') > > I assume it would take any unicode character and 'do the right thing', > including replacing higher ordinal chars with xml entity refs. But I still > get > > UnicodeEncodeError: 'ascii' codec can't encode character u'

[Tutor] unicode encoding hell

2007-09-05 Thread David Bear
I'm using universal feed parser to grab an rss feed. I'm carefull not to use any sys.out, print, file write ops, etc, UNLESS I use a decode('utf-i') to convert the unicode string I get from feed parser to utf-8. However, I'm still getting the blasted decode error stating that one of the items in t

Re: [Tutor] unicode and character sets

2007-08-16 Thread tpc247
On 8/16/07, Kent Johnson <[EMAIL PROTECTED]> wrote: > > [EMAIL PROTECTED] wrote: > > Good start! thanks, one of the good folks at metafiler provided the link to an excellent introductory article I don't think this is necessary. Did it actually fix anything? Changing > the default encoding is not

Re: [Tutor] unicode and character sets

2007-08-16 Thread tpc247
On 8/16/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > > > thanks, one of the good folks at metafiler provided the link to an > excellent introductory article > > correction: metafilter ___ Tutor maillist - Tutor@python.org http://mail.python.org/

Re: [Tutor] unicode and character sets

2007-08-16 Thread Kent Johnson
[EMAIL PROTECTED] wrote: > http://www.joelonsoftware.com/articles/Unicode.html > > I realize the following: It does not make sense to have a string without > knowing what encoding it uses. There is no such thing as plain text. Good start! > > Ok. Fine. In Mozilla, by clicking on View, Charac

[Tutor] unicode and character sets

2007-08-16 Thread tpc247
dear fellow Python enthusiasts, I recently wrote a script that grabs a file containing a list of ISO defined countries and creates an html select element. That's all well and good, and everything seems to work fine, except for one little nagging problem: http://en.wikipedia.org/wiki/Aland_Island

Re: [Tutor] Unicode in List Object

2007-03-26 Thread Tim Golden
Roman Kreuzhuber wrote: > Thanks for the quick response! > I see! Oh I didn't realize that it's not the list which raises an error. > For a test I tried to insert a string containing a unicode character as > follows: > > ListObject = [] > ListObject.insert(0,u"Möälasdji") By the way, aside from

Re: [Tutor] Unicode in List Object

2007-03-26 Thread Tim Golden
Andre Engels wrote: > 2007/3/26, Roman Kreuzhuber <[EMAIL PROTECTED]>: >> >> Thanks for the quick response! >> I see! Oh I didn't realize that it's not the list which raises an error. >> For a test I tried to insert a string containing a unicode character as >> follows: >> >> ListObject = [] >> Lis

Re: [Tutor] Unicode in List Object

2007-03-26 Thread Kent Johnson
Roman Kreuzhuber wrote: > Thanks for the quick response! > I see! Oh I didn't realize that it's not the list which raises an error. > For a test I tried to insert a string containing a unicode character as > follows: > > ListObject = [] > ListObject.insert(0,u"Möälasdji") > > which raises: "Synt

Re: [Tutor] Unicode in List Object

2007-03-26 Thread Andre Engels
2007/3/26, Roman Kreuzhuber <[EMAIL PROTECTED]>: Thanks for the quick response! I see! Oh I didn't realize that it's not the list which raises an error. For a test I tried to insert a string containing a unicode character as follows: ListObject = [] ListObject.insert(0,u"Möälasdji") which rais

Re: [Tutor] Unicode in List Object

2007-03-26 Thread Roman Kreuzhuber
ld this error have been raised too if this was an input from a GUI-text-object? I'm sorry for this silly question but I'm more or less completely new to python and never encountered similar errors with different languages roman >From: Tim Golden <[EMAIL PROTECTED]> >T

Re: [Tutor] Unicode in List Object

2007-03-26 Thread Tim Golden
Roman Kreuzhuber wrote: > I want to store multiple inputs from text fields in a list-object, which > works as a very small databank. The problem is that this data will contain > unicode characters I'm not sure why you think this is a problem. A Python list can hold anything, including unicode

[Tutor] Unicode in List Object

2007-03-26 Thread Roman Kreuzhuber
Hello! I need some help: I want to store multiple inputs from text fields in a list-object, which works as a very small databank. The problem is that this data will contain unicode characters as i live in a german-speaking country. I've searched through the internet for days but without any succ

[Tutor] UNICODE BEST RESPONSE

2006-09-23 Thread anil maran
Hum, I don't have any problems, and I don't do anything special ... - I use PostgreSQL, so I created my database using UTF-8 encoding. - my Python modules start with "# -*- coding: utf-8 -*-". - all my modules and my templates are utf-8 encoded (I use Vim, so I use ":set encoding=utf-8", but it sh

Re: [Tutor] Unicode problems

2006-08-31 Thread Kent Johnson
Ed Singleton wrote: > On 8/29/06, Kent Johnson <[EMAIL PROTECTED]> wrote: >>> The main problem I am having is in getting python not to give an >>> error when it encounters a sterling currency sign (£, pound sign here >>> in UK), which I suspect might be some wider problem on the mac as when >>> I

  1   2   >