Hi Alan and other Gurus, if you look carefully at the string below, you see that in amongst the "\x" stuff you have the text I want: z tfile://home/alpha which I know to be an address on my system, plus a bit of preceeding txt. Alan Gauld wrote: >> The file's encoding is binary or something >> >> Here is the first section of the file: >> '\x00\x00\x00\x02\xb8,\x08\x9f\x00\x00z\xa8\x00\x00\x01\xf4\x00\x00\x01\xf4\x00\x00\x00t\x00f\x00i\x00l\x00e\x00:\x00/\x00h\x00o\x00m\x00e\x00/\x00a\x00l' >> >> >> >> Does that tell you anything? > But that is almost certainly the wrong approach, you'll never > figure out where the word boundaries are without them! So I believe this is the right approach. in fact, If I print the string, without any modifications: I get the following sort of stuff: ¸z¨ôôtfile:/home/alpha/care/my_details.aspx.htmlÿÿÿÿÿÿÿÿ%oô¯0%oô¯0l
So this is one approach that will work. I have no idea what sort of encoding it is, but if someone could tell me how to get rid of what I assume are hex digits. In a hex editor it turns out to be readable and sensible url's with spaces between each digit, and a bit of crud at the end of url's, just as above. Any suggestions with that additional info? I've used struct before, it is a very nice module. Could this be some sort of UTF encoding? I think I was a bit light on info with that first post. Thanks for your time, Matt _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor