Thanks to everyone who replied!
I'll take a further look into the encoding of the file because I'm
interested in that for other reasons. In the output I saw, u"\xe1" (and a
few others I found after sending my note) were prevalent around the splits.
For the moment, though, I've solved my immediate
Jeremy Reichman wrote:
I have some characters in line strings in a file I'm processing that appear
to be Unicode. (When I print them to the shell from my script, they are
Asian characters for files like fonts in the Mac OS X filesystem.)
When I run a.split() on the affected line strings, they sp
I have some characters in line strings in a file I'm processing that appear
to be Unicode. (When I print them to the shell from my script, they are
Asian characters for files like fonts in the Mac OS X filesystem.)
When I run a.split() on the affected line strings, they split on what I'm
guessing