Thanks all! Kent, this syntax worked. I was able to figure it out the encoding just with trial and error. It is utf16. Now the only thing is that the conversion is double-spacing the lines of data. I'm thinking this must be something that I need to fix in my syntax. I will continue to try and figure it out, but any pointing out of the obvious or other ideas would be much appreciated. Again, newbie here.
Thanks Matt Matthew Pirritano, Ph.D. Research Analyst IV Medical Services Initiative (MSI) Orange County Health Care Agency (714) 568-5648 -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Kent Johnson Sent: Monday, April 06, 2009 5:51 PM To: Pirritano, Matthew Cc: Python Tutor Subject: Re: [Tutor] unicode to plain text conversion On Mon, Apr 6, 2009 at 6:48 PM, Pirritano, Matthew <[email protected]> wrote: > Hello python people, > > I am a total newbie. I have a very large file > 4GB that I need to > convert from Unicode to plain text. I used to just use dos when the file > was < 4GB but it no longer seems to work. Can anyone point me to some > python code that might perform this function? What is the encoding of the Unicode file? Assuming that the file has lines that will each fit in memory, you can use the codecs module to decode the unicode. Something like this: import codecs inp = codecs.open('Unicode_file.txt', 'r', 'utf-16le') outp = open('new_text_file.txt') outp.writelines(inp) inp.close() outp.close() The above code assumes UTF-16LE encoding, change it to the correct one if that is not right. A list of supported encodings is here: http://docs.python.org/library/codecs.html#id3 Kent _______________________________________________ Tutor maillist - [email protected] http://mail.python.org/mailman/listinfo/tutor _______________________________________________ Tutor maillist - [email protected] http://mail.python.org/mailman/listinfo/tutor
