On Thu, Jun 28, 2012 at 2:55 PM, James Chapman <ja...@uplinkzero.com> wrote: > Why can I not convert my existing byte string into a unicode string?
That would work fine. > In the mean time I'll create my original string as unicode and see if that > solves my problem. > >>>> fileName = unicode(filename) > > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c in position 35: > invalid start byte Here's a couple of questions that you'll need to answer 'Yes' to before you're going to get this to work reliably: Are you familiar with the differences between byte strings and unicode strings? Do you understand how to convert from one to the other, using a particular encoding? Do you know what encoding your source file is saved in? If your string is not coming from a source file, but some other source of bytes, do you know what encoding those bytes are using? Try the following. Before trying to convert filename to unicode, do a "print repr(filename)". That will show you the byte string, along with the numeric codes for the non-ascii parts. Then convert those bytes to a unicode object using the appropriate encoding. If the bytes are utf-8, then you'd do something like this: unicode_filename = unicode(filename, 'utf-8') If your bytestring is actually shift-jis encoded, you'd do this instead: unicode_filename = unicode(filename, 'shift-jis') If you don't know what encoding your byte string is in, you either have to give up, guess, or try a bunch of likely possibilities until something works. If you really, really have to guess and there's no way for you to know for sure what encoding a particular byte string is in, the third party chardet module may be able to help. -- Jerry _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor