Mark Lawrence: Yes, I did... I kept encountering errors when trying to post the first time. I didn't think my question went through, so I tried this one. Even if I were to purposefully ask the question in multiple places, why does that concern you? I wasn't aware that asking for help in multiple places is forbidden. I'm sorry that it offended you so much that you felt the need to respond in that manner instead of providing assistance...
Cheers tutor-requ...@python.org wrote: Send Tutor mailing list submissions to tutor@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/tutor or, via email, send a message with subject or body 'help' to tutor-requ...@python.org You can reach the person managing the list at tutor-ow...@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Tutor digest..." Today's Topics: 1. Re: Encoding error when reading text files in Python 3 (Steven D'Aprano) 2. Re: Search and replace text in XML file? (Mark Lawrence) 3. Re: Encoding error when reading text files in Python 3 (Dat Huynh) 4. Re: Flatten a list in tuples and remove doubles (Francesco Loffredo) 5. Re: Flatten a list in tuples and remove doubles (Francesco Loffredo) ---------------------------------------------------------------------- Message: 1 Date: Sat, 28 Jul 2012 20:09:28 +1000 From: Steven D'Aprano <st...@pearwood.info> To: tutor@python.org Subject: Re: [Tutor] Encoding error when reading text files in Python 3 Message-ID: <5013ba58.1040...@pearwood.info> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Dat Huynh wrote: > Dear all, > > I have written a simple application by Python to read data from text files. > > Current I have both Python version 2.7.2 and Python 3.2.3 on my laptop. > I don't know why it does not run on Python version 3 while it runs > well on Python 2. Python 2 is more forgiving of beginner errors when dealing with text and bytes, but makes it harder to deal with text correctly. Python 3 makes it easier to deal with text correctly, but is less forgiving. When you read from a file in Python 2, it will give you *something*, even if it is the wrong thing. It will not give an decoding error, even if the text you are reading is not valid text. It will just give you junk bytes, sometimes known as moji-bake. Python 3 no longer does that. It tells you when there is a problem, so you can fix it. > Could you please tell me how I can run it on python 3? > Following is my Python code. > > ------------------------------ > for subdir, dirs, files in os.walk(rootdir): > for file in files: > print("Processing [" +file +"]...\n" ) > f = open(rootdir+file, 'r') > data = f.read() > f.close() > print(data) > ------------------------------ > > This is the error message: [...] > UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position > 4980: ordinal not in range(128) This tells you that you are reading a non-ASCII file but haven't told Python what encoding to use, so by default Python uses ASCII. Do you know what encoding the file is? Do you understand about Unicode text and bytes? If not, I suggest you read this article: http://www.joelonsoftware.com/articles/Unicode.html In Python 3, you can either tell Python what encoding to use: f = open(rootdir+file, 'r', encoding='utf8') # for example or you can set an error handler: f = open(rootdir+file, 'r', errors='ignore') # for example or both f = open(rootdir+file, 'r', encoding='ascii', errors='replace') You can see the list of encodings and error handlers here: http://docs.python.org/py3k/library/codecs.html Unfortunately, Python 2 does not support this using the built-in open function. Instead, you have to uses codecs.open instead of the built-in open, like this: import codecs f = codecs.open(rootdir+file, 'r', encoding='utf8') # for example which fortunately works in both Python 2 or 3. Or you can read the file in binary mode, and then decode it into text: f = open(rootdir+file, 'rb') data = f.read() f.close() text = data.decode('cp866', 'replace') print(text) If you don't know the encoding, you can try opening the file in Firefox or Internet Explorer and see if they can guess it, or you can use the chardet library in Python. http://pypi.python.org/pypi/chardet Or if you don't care about getting moji-bake, you can pretend that the file is encoded using Latin-1. That will pretty much read anything, although what it gives you may be junk. -- Steven ------------------------------ Message: 2 Date: Sat, 28 Jul 2012 11:25:30 +0100 From: Mark Lawrence <breamore...@yahoo.co.uk> To: tutor@python.org Subject: Re: [Tutor] Search and replace text in XML file? Message-ID: <jv0emn$eda$1...@dough.gmane.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 28/07/2012 02:38, Todd Tabern wrote: > I'm looking to search an entire XML file for specific text and replace that > text, while maintaining the structure of the XML file. The text occurs within > multiple nodes throughout the file. > I basically need to replace every occurrence C:\Program Files with C:\Program > Files (x86), regardless of location. For example, that text appears within: > <URL>C:\Program Files\\Map Data\Road_Centerlines.shp</URL> > and also within: > <RoutingIndexPathName>C:\Program > Files\Templates\RoadNetwork.rtx</RoutingIndexPathName> > ...among others. > I've tried some non-python methods and they all ruined the XML structure. > I've been Google searching all day and can only seem to find solutions that > look for a specific node and replace the whole string between the tags. > I've been looking at using minidom to achieve this but I just can't seem to > figure out the right method. > My end goal, once I have working code, is to compile an exe that can work on > machines without python, allowing a user can click in order to perform the > XML modification. > Thanks in advance. > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > Did you really have to ask the same question on two separate Python mailing lists and only 15 minutes apart? -- Cheers. Mark Lawrence. ------------------------------ Message: 3 Date: Sat, 28 Jul 2012 18:45:47 +0800 From: Dat Huynh <htdat...@gmail.com> To: tutor@python.org Subject: Re: [Tutor] Encoding error when reading text files in Python 3 Message-ID: <CAPw=odian5_mymudr+oiawstgog9i+zdosk-0rry1arxif1...@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 I change my code and it runs on Python 3 now. f = open(rootdir+file, 'rb') data = f.read().decode('utf8', 'ignore') Thank you very much. Sincerely, Dat. On Sat, Jul 28, 2012 at 6:09 PM, Steven D'Aprano <st...@pearwood.info> wrote: > Dat Huynh wrote: >> >> Dear all, >> >> I have written a simple application by Python to read data from text >> files. >> >> Current I have both Python version 2.7.2 and Python 3.2.3 on my laptop. >> I don't know why it does not run on Python version 3 while it runs >> well on Python 2. > > > Python 2 is more forgiving of beginner errors when dealing with text and > bytes, but makes it harder to deal with text correctly. > > Python 3 makes it easier to deal with text correctly, but is less forgiving. > > When you read from a file in Python 2, it will give you *something*, even if > it is the wrong thing. It will not give an decoding error, even if the text > you are reading is not valid text. It will just give you junk bytes, > sometimes known as moji-bake. > > Python 3 no longer does that. It tells you when there is a problem, so you > can fix it. > > > >> Could you please tell me how I can run it on python 3? >> Following is my Python code. >> >> ------------------------------ >> for subdir, dirs, files in os.walk(rootdir): >> for file in files: >> print("Processing [" +file +"]...\n" ) >> f = open(rootdir+file, 'r') >> data = f.read() >> f.close() >> print(data) >> ------------------------------ >> >> This is the error message: > > [...] > >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position >> 4980: ordinal not in range(128) > > > > This tells you that you are reading a non-ASCII file but haven't told Python > what encoding to use, so by default Python uses ASCII. > > Do you know what encoding the file is? > > Do you understand about Unicode text and bytes? If not, I suggest you read > this article: > > http://www.joelonsoftware.com/articles/Unicode.html > > > In Python 3, you can either tell Python what encoding to use: > > f = open(rootdir+file, 'r', encoding='utf8') # for example > > or you can set an error handler: > > f = open(rootdir+file, 'r', errors='ignore') # for example > > or both > > f = open(rootdir+file, 'r', encoding='ascii', errors='replace') > > > You can see the list of encodings and error handlers here: > > http://docs.python.org/py3k/library/codecs.html > > > Unfortunately, Python 2 does not support this using the built-in open > function. Instead, you have to uses codecs.open instead of the built-in > open, like this: > > import codecs > f = codecs.open(rootdir+file, 'r', encoding='utf8') # for example > > which fortunately works in both Python 2 or 3. > > > Or you can read the file in binary mode, and then decode it into text: > > f = open(rootdir+file, 'rb') > data = f.read() > f.close() > text = data.decode('cp866', 'replace') > print(text) > > > If you don't know the encoding, you can try opening the file in Firefox or > Internet Explorer and see if they can guess it, or you can use the chardet > library in Python. > > http://pypi.python.org/pypi/chardet > > Or if you don't care about getting moji-bake, you can pretend that the file > is encoded using Latin-1. That will pretty much read anything, although what > it gives you may be junk. > > > > -- > Steven > > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor ------------------------------ Message: 4 Date: Sat, 28 Jul 2012 17:12:57 +0200 From: Francesco Loffredo <f...@libero.it> To: tutor@python.org Subject: Re: [Tutor] Flatten a list in tuples and remove doubles Message-ID: <50140179.2080...@libero.it> Content-Type: text/plain; charset=windows-1251; format=flowed Il 19/07/2012 19:33, PyProg PyProg ha scritto: > Hi all, > > I would get a new list as: > > [(0, '3eA', 'Dupont', 'Juliette', '11.0/10.0', '4.0/5.0', '17.5/30.0', > '3.0/5.0', '4.5/10.0', '35.5/60.0'), (1, '3eA', 'Pop', 'Iggy', > '12.0/10.0', '3.5/5.0', '11.5/30.0', '4.0/5.0', '5.5/10.0', > '7.5/10.0', '40.5/60.0')] > > ... from this one: > > [(0, '3eA', 'Dupont', 'Juliette', 0, 11.0, 10.0), (0, '3eA', 'Dupont', > 'Juliette', 1, 4.0, 5.0), (0, '3eA', 'Dupont', 'Juliette', 2, 17.5, > 30.0), (0, '3eA', 'Dupont', 'Juliette', 3, 3.0, 5.0), (0, '3eA', > 'Dupont', 'Juliette', 4, 4.5, 10.0), (0, '3eA', 'Dupont', 'Juliette', > 5, 35.5, 60.0), (1, '3eA', 'Pop', 'Iggy', 0, 12.0, 10.0), (1, '3eA', > 'Pop', 'Iggy', 1, 3.5, 5.0), (1, '3eA', 'Pop', 'Iggy', 2, 11.5, 30.0), > (1, '3eA', 'Pop', 'Iggy', 3, 4.0, 5.0), (1, '3eA', 'Pop', 'Iggy', 4, > 5.5, 10.0), (1, '3eA', 'Pop', 'Iggy', 5, 40.5, 60.0)] > > How to make that ? I'm looking for but for now I can't do it. > > Thanks in advance. > > a+ > I had to study carefully your present and desired lists, and I understood what follows (please, next time explain !): - each 7-tuple in your present list is a record for some measure relative to a person. Its fields are as follows: - field 0: code (I think you want that in growing order) - field 1: group code (could be a class or a group to which both of your example persons belong) - fields 2, 3: surname and name of the person - field 4: progressive number of the measure (these are in order already, but I think you want to enforce this) that you want to exclude from the output list while keeping the order - field 5, 6: numerator and denominator of a ratio that is the measure. you want the ratio to be written as a single string: "%s/%s" % field5, field6 Taking for granted this structure and my educated guesses about what you didn't tell us, here's my solution: def flatten(inlist) """ takes PyProg PyProg's current list and returns his/her desired one, given my guesses about the structure of inlist and the desired result. """ tempdict = {} for item in inlist: if len(item) != 7: print "Item errato: \n", item id = tuple(item[:4]) progr = item[4] payload = "%s/%s" % item[5:] if id in tempdict: tempdict[id].extend([(progr, payload)]) else: tempdict[id] = [(progr, payload)] for item in tempdict: tempdict[item].sort() # so we set payloads in progressive order, if they aren't already # print "Temporary Dict: ", tempdict tmplist2 = [] for item in tempdict: templist = [] templist.extend(item) templist.extend(tempdict[item]) tmplist2.append(tuple(templist)) tmplist2.sort()# so we set IDs in order # print "Temporary List: ", tmplist2 outlist = [] for item in tmplist2: templist = [] if isinstance(item, tuple): for subitem in item: if isinstance(subitem, tuple): templist.append(subitem[1]) else: templist.append(subitem) outlist.append(tuple(templist)) else: outlist.append(item) # print "\nOutput List: ", outlist return outlist ------------------------------ Message: 5 Date: Sat, 28 Jul 2012 18:29:20 +0200 From: Francesco Loffredo <f...@libero.it> To: tutor@python.org Subject: Re: [Tutor] Flatten a list in tuples and remove doubles Message-ID: <50141360.6030...@libero.it> Content-Type: text/plain; charset=windows-1251; format=flowed Il 28/07/2012 17:12, Francesco Loffredo ha scritto: > Il 19/07/2012 19:33, PyProg PyProg ha scritto: >> Hi all, >> >> I would get a new list as: >> >> [(0, '3eA', 'Dupont', 'Juliette', '11.0/10.0', '4.0/5.0', '17.5/30.0', >> '3.0/5.0', '4.5/10.0', '35.5/60.0'), (1, '3eA', 'Pop', 'Iggy', >> '12.0/10.0', '3.5/5.0', '11.5/30.0', '4.0/5.0', '5.5/10.0', >> '7.5/10.0', '40.5/60.0')] >> >> ... from this one: >> >> [(0, '3eA', 'Dupont', 'Juliette', 0, 11.0, 10.0), (0, '3eA', 'Dupont', >> 'Juliette', 1, 4.0, 5.0), (0, '3eA', 'Dupont', 'Juliette', 2, 17.5, >> 30.0), (0, '3eA', 'Dupont', 'Juliette', 3, 3.0, 5.0), (0, '3eA', >> 'Dupont', 'Juliette', 4, 4.5, 10.0), (0, '3eA', 'Dupont', 'Juliette', >> 5, 35.5, 60.0), (1, '3eA', 'Pop', 'Iggy', 0, 12.0, 10.0), (1, '3eA', >> 'Pop', 'Iggy', 1, 3.5, 5.0), (1, '3eA', 'Pop', 'Iggy', 2, 11.5, 30.0), >> (1, '3eA', 'Pop', 'Iggy', 3, 4.0, 5.0), (1, '3eA', 'Pop', 'Iggy', 4, >> 5.5, 10.0), (1, '3eA', 'Pop', 'Iggy', 5, 40.5, 60.0)] >> >> How to make that ? I'm looking for but for now I can't do it. >> >> Thanks in advance. >> >> a+ >> > I had to study carefully your present and desired lists, and I > understood what follows (please, next time explain !): > - each 7-tuple in your present list is a record for some measure > relative to a person. Its fields are as follows: > - field 0: code (I think you want that in growing order) > - field 1: group code (could be a class or a group to which both > of your example persons belong) > - fields 2, 3: surname and name of the person > - field 4: progressive number of the measure (these are in order > already, but I think you want to enforce this) that you want to > exclude from the output list while keeping the order > - field 5, 6: numerator and denominator of a ratio that is the > measure. you want the ratio to be written as a single string: "%s/%s" > % field5, field6 > > Taking for granted this structure and my educated guesses about what > you didn't tell us, here's my solution: > > def flatten(inlist) > """ > takes PyProg PyProg's current list and returns his/her desired one, > given my guesses about the structure of inlist and the desired > result. > """ > tempdict = {} > for item in inlist: > if len(item) != 7: > print "Item errato: \n", item > id = tuple(item[:4]) > progr = item[4] > payload = "%s/%s" % item[5:] > if id in tempdict: > tempdict[id].extend([(progr, payload)]) > else: > tempdict[id] = [(progr, payload)] > for item in tempdict: > tempdict[item].sort() # so we set payloads in progressive > order, if they aren't already > # print "Temporary Dict: ", tempdict > tmplist2 = [] > for item in tempdict: > templist = [] > templist.extend(item) > templist.extend(tempdict[item]) > tmplist2.append(tuple(templist)) > tmplist2.sort()# so we set IDs in order > # print "Temporary List: ", tmplist2 > outlist = [] > for item in tmplist2: > templist = [] > if isinstance(item, tuple): > for subitem in item: > if isinstance(subitem, tuple): > templist.append(subitem[1]) > else: > templist.append(subitem) > outlist.append(tuple(templist)) > else: > outlist.append(item) > # print "\nOutput List: ", outlist > return outlist > ok, as usual when I look again at something I wrote, I found some little mistakes. Here's my errata corrige: 1- of course, a function definition must end with a colon... line 1: def flatten(inlist): 2- sorry, English is not my first language... line 9: print "Item length wrong!\n", item 3- I didn't insert a break statement after line 9, but if inlist contained a wrong item it would be nice to do something more than simply tell the user, for example we could skip that item, or trim / pad it, or stop the execution, or raise an exception... I just told it to the unsuspecting user, and this may very probably lead to some exception in a later point, or (much worse) to wrong results. So: line 8-9: if len(item) != 7: print "Item length wrong!\n", item raise ValueError("item length != 7") ... now I feel better ... but I must avoid reading my function again, or I'll find some more bugs! Francesco ------------------------------ _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor End of Tutor Digest, Vol 101, Issue 99 ************************************** _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor