That looks like a useful combination. Thanks.
On 6 May 2018 at 17:32, Mark Lawrence wrote:
> On 05/05/18 18:59, Simon Connah wrote:
>>
>> Hi,
>>
>> I'm writing a very simple web scraper. It'll download a page from a
>> website and then store the result in a database of
Two things. The first thing is that you can download the page as a string
and delete a everything between tags. Secondly It might be worth looking at
Udacity cs101 as this course is all about a search engine.
On Sat, 5 May 2018 at 22:27, Simon Connah wrote:
> Hi,
>
> I'm
Thanks for the replies, everyone. Beautiful Soup looks like a good option.
My primary goal is to extract the main body text, the title and the
meta description from a web page and run it through one of the cloud
Natural Language processing services to find out some information that
I'd like to
On 05/05/18 18:59, Simon Connah wrote:
Hi,
I'm writing a very simple web scraper. It'll download a page from a
website and then store the result in a database of some sort. The
problem is that this will obviously include a whole heap of HTML,
JavaScript and maybe even some CSS. None of which is
On Sat, May 5, 2018 at 12:59 PM, Simon Connah wrote:
> I was wondering if there was a way in which I could download a web
> page and then just extract the main body of text without all of the
> HTML.
I do not have any experience with this, but I like to collect books.
On 05/05/2018 11:59 AM, Simon Connah wrote:
> Hi,
>
> I'm writing a very simple web scraper. It'll download a page from a
> website and then store the result in a database of some sort. The
> problem is that this will obviously include a whole heap of HTML,
> JavaScript and maybe even some CSS.
Hi,
I'm writing a very simple web scraper. It'll download a page from a
website and then store the result in a database of some sort. The
problem is that this will obviously include a whole heap of HTML,
JavaScript and maybe even some CSS. None of which is useful to me.
I was wondering if there