On Sat, May 5, 2018 at 12:59 PM, Simon Connah <scopensou...@gmail.com> wrote:
> I was wondering if there was a way in which I could download a web > page and then just extract the main body of text without all of the > HTML. I do not have any experience with this, but I like to collect books. One of them [1] says on page 245: "Beautiful Soup is a module for extracting information from an HTML page (and is much better for this purpose than regular expressions)." I believe this topic has come up before on this list as well as the main Python list. You may want to check it out. It can be installed with pip. [1] "Automate the Boring Stuff with Python -- Practical Programming for Total Beginners" by Al Sweigart. HTH! -- boB _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor