On Wed, Dec 5, 2012 at 7:13 PM, Ed Owens <[email protected]> wrote:
>>>> str(string)
> '[<div class="wx-timestamp">\n<div class="wx-subtitle wx-timestamp">Updated:
> Dec 5, 2012, 5:08pm EST</div>\n</div>]'
>>>> m = re.search('":\b(\w+\s+\d+,\s+\d+,\s+\d+:\d+.m\s+\w+)<', str(string))
>>>> print m
> None
You need a raw string for the boundary marker \b (i.e the boundary
between \w and \W), else it creates a backspace control character.
Also, I don't see why you have ": at the start of the expression. This
works:
>>> s = 'Updated: Dec 5, 2012, 5:08pm EST</div>'
>>> m = re.search(r'\b(\w+\s+\d+,\s+\d+,\s+\d+:\d+.m\s+\w+)<', s)
>>> m.group(1)
'Dec 5, 2012, 5:08pm EST'
But wouldn't it be simpler and more reliable to use an HTML parser?
_______________________________________________
Tutor maillist - [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor