Where did the string come from?  It looks at first glance like you have two 
bytes for each character instead of the one you expect.  Is this perhaps a 
Unicode string instead of ASCII?

Sent from my iPad

On 2011/11/20, at 10:28, dave selby <dave6...@gmail.com> wrote:

> Hi All,
> 
> I have a long string which is an HTML file, I strip the HTML tags away
> and make a list with
> 
> text = re.split('<.*?>', HTML)
> 
> I then tried to search for a string with text.index(...) but it was
> not found, printing HTML to a terminal I get what I expect, a block of
> tags and text, I split the HTML and print text and I get loads of
> 
> \x00T\x00r\x00i\x00a\x00  ie I get \x00 breaking up every character.
> 
> Any idea what is happening and how to get back to a list of ascii strings ?
> 
> Cheers
> 
> Dave
> 
> -- 
> 
> Please avoid sending me Word or PowerPoint attachments.
> See http://www.gnu.org/philosophy/no-word-attachments.html
> _______________________________________________
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to