subject:"\[Tutor\] Extract main text from HTML document"

Re: [Tutor] Extract main text from HTML document

2018-05-07 Thread Simon Connah

That looks like a useful combination. Thanks. On 6 May 2018 at 17:32, Mark Lawrence wrote: > On 05/05/18 18:59, Simon Connah wrote: >> >> Hi, >> >> I'm writing a very simple web scraper. It'll download a page from a >> website and then store the result in a database of some sort. The >> problem i

Re: [Tutor] Extract main text from HTML document

2018-05-06 Thread Brian Lockwood

Two things. The first thing is that you can download the page as a string and delete a everything between tags. Secondly It might be worth looking at Udacity cs101 as this course is all about a search engine. On Sat, 5 May 2018 at 22:27, Simon Connah wrote: > Hi, > > I'm writing a very simple web

Re: [Tutor] Extract main text from HTML document

2018-05-06 Thread Simon Connah

Thanks for the replies, everyone. Beautiful Soup looks like a good option. My primary goal is to extract the main body text, the title and the meta description from a web page and run it through one of the cloud Natural Language processing services to find out some information that I'd like to kno

Re: [Tutor] Extract main text from HTML document

2018-05-06 Thread Mark Lawrence

On 05/05/18 18:59, Simon Connah wrote: Hi, I'm writing a very simple web scraper. It'll download a page from a website and then store the result in a database of some sort. The problem is that this will obviously include a whole heap of HTML, JavaScript and maybe even some CSS. None of which is

Re: [Tutor] Extract main text from HTML document

2018-05-05 Thread boB Stepp

On Sat, May 5, 2018 at 12:59 PM, Simon Connah wrote: > I was wondering if there was a way in which I could download a web > page and then just extract the main body of text without all of the > HTML. I do not have any experience with this, but I like to collect books. One of them [1] says on pag

Re: [Tutor] Extract main text from HTML document

2018-05-05 Thread Mats Wichmann

On 05/05/2018 11:59 AM, Simon Connah wrote: > Hi, > > I'm writing a very simple web scraper. It'll download a page from a > website and then store the result in a database of some sort. The > problem is that this will obviously include a whole heap of HTML, > JavaScript and maybe even some CSS. No

[Tutor] Extract main text from HTML document

2018-05-05 Thread Simon Connah

Hi, I'm writing a very simple web scraper. It'll download a page from a website and then store the result in a database of some sort. The problem is that this will obviously include a whole heap of HTML, JavaScript and maybe even some CSS. None of which is useful to me. I was wondering if there w

Re: [Tutor] Extract main text from HTML document

Re: [Tutor] Extract main text from HTML document

Re: [Tutor] Extract main text from HTML document

Re: [Tutor] Extract main text from HTML document

Re: [Tutor] Extract main text from HTML document

Re: [Tutor] Extract main text from HTML document

[Tutor] Extract main text from HTML document

7 matches

Site Navigation

Mail list logo

Footer information