Hi Bob, Thanks, very much, for your quick and detailed reply. This is just a utility script to read some sentiment analysis data to manipulate the positive and negative sentiments of multiple people into a single sentiment per line. The data I got was from some public domain which I have no control over. What worked was Steve's suggestion to ignore the errors (I made sure that my results are not messed up when I choose to ignore the errors). Thanks for the other suggestions. I haven't done much of file I/O in python. Hence the crude method that I used.
On Mon, Oct 28, 2013 at 7:31 PM, bob gailer <bgai...@gmail.com> wrote: > On 10/28/2013 6:13 PM, SM wrote: > > Hello, > Hi welcome to the Tutor list > > > > I have an extremely simple piece of code > > which could be even simpler - see my comments below > > > > which reads a .csv file, which has 1000 lines of fixed fields, one line > at a time, and tries to print some values. > > > > 1 #!/usr/bin/python3 > > 2 # > > 3 import sys, time, re, os > > 4 > > 5 if __name__=="__main__": > > 6 > > 7 ifd = open("infile.csv", 'r') > > The simplest way to discard the first line is to follow the open with > 8 ifd.readline() > > The simplest way to track line number is > > 10 for linenum, line in enumerate(ifd, 1): > > > 11 line1 = line.split(",") > > FWIW you don't need re to do this split > > > 12 total = 0 > > > 19 print("LINE: ", linenum, line1[1]) > > 20 for i in range(1,8): > > 21 if line1[i].strip(): > > 22 print("line[i] ", int(line1[i])) > > 23 total = total + int(line1[i]) > > 24 print("Total: ", total) > > 25 > > 26 if total >= 4: > > 27 print("POSITIVE") > > 28 else: > > 29 print("Negative") > > 31 ifd.close() > > That should have () after it, since it is a method call. > > > > > It works fine till it parses the 1st 126 lines in the input file. For > the 127th line (irrespective of the contents of the actual line), it prints > the following error: > > Traceback (most recent call last): > > File "p1.py", line 10, in <module> > > for line in ifd: > > File "/usr/lib/python3.2/codecs.py"**, line 300, in decode > > (result, consumed) = self._buffer_decode(data, self.errors, final) > > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position > 1173: invalid continuation byte > Do you get exactly the same message irrespective of the contents of the > actual line? > > "Code points larger than 127 are represented by multi-byte sequences, > composed of a leading byte and one or more continuation bytes. The leading > byte has two or more high-order 1s followed by a 0, while continuation > bytes all have '10' in the high-order position." > > This suggests that a byte close to the end of the previous line is > "leading byte"and therefore a continuation byte was expected but where the > 0xe9was found. > > BTWhen I divide 1173 by 126 I get something close to 9 characters per lne. > That is not possible, as there would have to be at least 16 characters in > each line. > > Best you send us at least the first 130 lines so we can play with the file. > > -- > Bob Gailer > 919-636-4239 > Chapel Hill NC > >
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor