Re: [Tutor] Parsing and collecting keywords from a webpage

2018-06-21 Thread Peter Otten
Daniel Bosah wrote: > new_list = [x.encode('latin-1') for x in sorted(paul)] I don't see why you would need bytes > search = "(" + b"|".join(new_list).decode() + ")" + "" #re.complie needs when your next step is to decode it. I'm not sure why it even works as the default encoding is usually

Re: [Tutor] Parsing and collecting keywords from a webpage

2018-06-21 Thread Alan Gauld via Tutor
On 20/06/18 20:32, Daniel Bosah wrote: > reg = pattern.findall(str(soup)) > > for i in reg: > if i in reg and paul: # this loop checks to see if elements are in > both the regexed parsed list and the list. No it doesn't. It checks if i is in reg and if paul is non empty - which it

Re: [Tutor] Parsing and collecting keywords from a webpage

2018-06-20 Thread Alan Gauld via Tutor
On 20/06/18 20:32, Daniel Bosah wrote: > # coding: latin-1 > from bs4 import BeautifulSoup > from urllib.request import urlopen > import re > > #new point to add... make rest of function then compare a list of monuments > notaries ( such as blvd, road, street, etc.) to a list of words containing

[Tutor] Parsing and collecting keywords from a webpage

2018-06-20 Thread Daniel Bosah
# coding: latin-1 from bs4 import BeautifulSoup from urllib.request import urlopen import re #new point to add... make rest of function then compare a list of monuments notaries ( such as blvd, road, street, etc.) to a list of words containing them. if contained, pass into new set ( ref notes in