[Tutor] Message seemed to bounce, so I will try again

nimrodx Sat, 12 Aug 2006 02:11:26 -0700

Hi All,

I am trying to fish through the history file for the Konquerer web 
browser, and pull out the
web sites visited.


The file's encoding is binary or something

Here is the first section of the file:
'\x00\x00\x00\x02\xb8,\x08\x9f\x00\x00z\xa8\x00\x00\x01\xf4\x00\x00\x01\xf4\x00\x00\x00t\x00f\x00i\x00l\x00e\x00:\x00/\x00h\x00o\x00m\x00e\x00/\x00a\x00l'
 


Does that tell you anything?

I have been trying to replace the pesky \x00's with something less 
annoying, but
with no success:
  import re
  pattern = r"\x00"
  re.sub(pattern, '', dat2)

That seems to work at the command line, but this this:

  web = re.compile(
                   r"(?P<addr>[/a-zA-Z0-9\.]+)"
                   )
  res = re.findall(web,dat2)
tends to give me back individual alphanumeric characters, "."'s, and "/"'s,
as if they had each been separated by an unmatched character:
e.g. ['z', 't', 'f', 'i', 'l', 'e', 'h', 'o', 'm', 'e', 'a', 'l', 'p', 
'h', 'a',...]

I was hoping for one web address per element of the list...

Suggestions greatly appreciated!!

Thanks,

Matt

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Message seemed to bounce, so I will try again

Reply via email to