[EMAIL PROTECTED] wrote:
> text=re.sub(r'(?s)\<.+?\>', '', html_text)
> (this will keep html entities, though)
here's a variation that handles that too:
http://effbot.org/zone/re-sub.htm#strip-html
--
http://mail.python.org/mailman/listinfo/python-list
robin <[EMAIL PROTECTED]> wrote:
> hi,
> i remember seeing this simple python function which would take raw html
> and output the content (body?) of the page as plain text (no <..> tags
> etc)
> i have been looking at htmllib and htmlparser but this all seems to
> complicated for what i'm looking f
> i remember seeing this simple python function which would take raw html
> and output the content (body?) of the page as plain text (no <..> tags
> etc)
http://www.aaronsw.com/2002/html2text/
--
http://mail.python.org/mailman/listinfo/python-list
lucks yummy. merci beaucoup.
robin
--
http://mail.python.org/mailman/listinfo/python-list
robin wrote:
> i remember seeing this simple python function which would take raw html
> and output the content (body?) of the page as plain text (no <..> tags
> etc)
> i have been looking at htmllib and htmlparser but this all seems to
> complicated for what i'm looking for. i just need the main
hi,
i remember seeing this simple python function which would take raw html
and output the content (body?) of the page as plain text (no <..> tags
etc)
i have been looking at htmllib and htmlparser but this all seems to
complicated for what i'm looking for. i just need the main text in the
body of