I change my code and it runs on Python 3 now. f = open(rootdir+file, 'rb') data = f.read().decode('utf8', 'ignore')
Thank you very much. Sincerely, Dat. On Sat, Jul 28, 2012 at 6:09 PM, Steven D'Aprano <st...@pearwood.info> wrote: > Dat Huynh wrote: >> >> Dear all, >> >> I have written a simple application by Python to read data from text >> files. >> >> Current I have both Python version 2.7.2 and Python 3.2.3 on my laptop. >> I don't know why it does not run on Python version 3 while it runs >> well on Python 2. > > > Python 2 is more forgiving of beginner errors when dealing with text and > bytes, but makes it harder to deal with text correctly. > > Python 3 makes it easier to deal with text correctly, but is less forgiving. > > When you read from a file in Python 2, it will give you *something*, even if > it is the wrong thing. It will not give an decoding error, even if the text > you are reading is not valid text. It will just give you junk bytes, > sometimes known as moji-bake. > > Python 3 no longer does that. It tells you when there is a problem, so you > can fix it. > > > >> Could you please tell me how I can run it on python 3? >> Following is my Python code. >> >> ------------------------------ >> for subdir, dirs, files in os.walk(rootdir): >> for file in files: >> print("Processing [" +file +"]...\n" ) >> f = open(rootdir+file, 'r') >> data = f.read() >> f.close() >> print(data) >> ------------------------------ >> >> This is the error message: > > [...] > >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position >> 4980: ordinal not in range(128) > > > > This tells you that you are reading a non-ASCII file but haven't told Python > what encoding to use, so by default Python uses ASCII. > > Do you know what encoding the file is? > > Do you understand about Unicode text and bytes? If not, I suggest you read > this article: > > http://www.joelonsoftware.com/articles/Unicode.html > > > In Python 3, you can either tell Python what encoding to use: > > f = open(rootdir+file, 'r', encoding='utf8') # for example > > or you can set an error handler: > > f = open(rootdir+file, 'r', errors='ignore') # for example > > or both > > f = open(rootdir+file, 'r', encoding='ascii', errors='replace') > > > You can see the list of encodings and error handlers here: > > http://docs.python.org/py3k/library/codecs.html > > > Unfortunately, Python 2 does not support this using the built-in open > function. Instead, you have to uses codecs.open instead of the built-in > open, like this: > > import codecs > f = codecs.open(rootdir+file, 'r', encoding='utf8') # for example > > which fortunately works in both Python 2 or 3. > > > Or you can read the file in binary mode, and then decode it into text: > > f = open(rootdir+file, 'rb') > data = f.read() > f.close() > text = data.decode('cp866', 'replace') > print(text) > > > If you don't know the encoding, you can try opening the file in Firefox or > Internet Explorer and see if they can guess it, or you can use the chardet > library in Python. > > http://pypi.python.org/pypi/chardet > > Or if you don't care about getting moji-bake, you can pretend that the file > is encoded using Latin-1. That will pretty much read anything, although what > it gives you may be junk. > > > > -- > Steven > > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor