Re: [Tutor] Extract main text from HTML document

boB Stepp Sat, 05 May 2018 14:46:18 -0700

On Sat, May 5, 2018 at 12:59 PM, Simon Connah <scopensou...@gmail.com> wrote:


> I was wondering if there was a way in which I could download a web
> page and then just extract the main body of text without all of the
> HTML.

I do not have any experience with this, but I like to collect books.
One of them [1] says on page 245:

"Beautiful Soup is a module for extracting information from an HTML
page (and is much better for this purpose than regular expressions)."

I believe this topic has come up before on this list as well as the
main Python list.  You may want to check it out.  It can be installed
with pip.

[1] "Automate the Boring Stuff with Python -- Practical Programming
for Total Beginners" by Al Sweigart.

HTH!
-- 
boB
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Extract main text from HTML document

Reply via email to