Thanks to everyone who responded on this thread, your time is greatly appreciated.
It appears however that my problem is related to the environment. I sent my original email right before leaving work and have since been working on a physical machine without any problems. I've copied some of that code to my remote virtual machine where I'm doing the dev work and the same example that works on my physical win7 machine fails on my virtual win 2008 machine. Win2008 host platform is Linux with VirtualBox. The only remaining question is whether this is a one off issue, whether it's related to the virtual machine or whether it's related to Windows 2008. I guess I'll find out tomorrow. Oh and Tim, you'll be happy to know that regex does not affect the string in this case. Well, at least not the way I'm using it to extract data. -- James At Thursday, 28/06/2012 on 21:17 Tim Golden wrote: On 28/06/2012 20:48, James Chapman wrote: > The name of the file I'm trying to open comes from a UTF-16 encoded > text file, I'm then using regex to extract the string (filename) I > need to open. OK. Let's focus on that. For the moment -- although it might well be very relevant -- I'm going to ignore the regex side of things. It's always trying to portray things like this because there's such confusion between what characters I write to represent the data and the data represented by those characters themselves! OK, let's adopt a convention whereby I represent the data as they kind of thing you'd see in a hex editor. This obviously isn't how it appear in a a text file but hopefully it'll be clear what's going on. I have a filename £10.txt -- that is the characters: POUND SIGN DIGIT ONE DIGIT ZERO FULL STOP LATIN SMALL LETTER T LATIN SMALL LETTER X LATIN SMALL LETTER T I have -- prior to your getting there -- placed this in a text file which I guarantee is UTF16-encoded. For the purposes of illustration I shall do that in Python code here: with open ("filedata.dat", "wb") as f: f.write (u"£10.txt".encode ("utf16")) The file is named "filedata.dat" and looks like this (per our convention): ff fe a3 00 31 00 30 00 2e 00 74 00 78 00 74 00 I now want to read the contents of the that file as a filename and open the file in question. Here goes: # # Open the file and extract the data as a set of # bytes into a Python (byte) string. # with open("filedata.dat", "rb") as f: data = f.read() # # Convert the data into a unicode object by decoding # the UTF16 bytes # filename = data.decode("utf16") # filename is now a unicode object which, depending on # what your console offers, will either display as # £10.txt or as \xa310.txt or as something else. # # Open that file by passing the unicode object directly # to Python's file-opening mechanism # ten_pound_txt = open (filename, "rb") print ten_pound_txt.read () # whatever ten_pound_txt.close () I don't know if that makes anything clearer for you, but at least it gives you something to try out. The business with the regex clouds the issue: regex can play a little awkwardly with Unicode, so you'd have to show some code if you need help there. TJG _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor