Crusier wrote: > Dear All, > > I am currently trying to download the stock code. I am using Python > 3.4 and the code is as follows: > > from bs4 import BeautifulSoup > import requests > import re > > url = > 'https://www.hkex.com.hk/eng/market/sec_tradinfo/stockcode/eisdeqty.htm' > > def web_scraper(url): > response = requests.get(url) > html = response.content > soup = BeautifulSoup(html,"html.parser") > for link in soup.find_all("a"): > stock_code = re.search('/d/d/d/d/d', "00001" ) > print(stock_code, '', link.text) > print(link.text) > > web_scraper(url) > > I am trying to retrieve the stock code from here: > <td class="verd_black12" width="18%">00001</td> > > or from a href. > > Please kindly inform which library I should use.
The good news is that you don't need regular expressions here, just beautiful soup is sufficient. Have a look at the html source of eisdeqty.html in a text editor, and then use the interactive interpreter to get closer to the desired result: Python 3.4.3 (default, Oct 14 2015, 20:28:29) [GCC 4.8.4] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import requests >>> from bs4 import BeautifulSoup >>> url = >>> 'https://www.hkex.com.hk/eng/market/sec_tradinfo/stockcode/eisdeqty.htm' >>> soup = BeautifulSoup(requests.get(url).content) [snip some intermediate attempts] >>> soup.html.body.table.table.table.table.tr.td <td class="verd_black12" width="18%"><b>STOCK CODE</b></td> >>> stock_codes = [tr.td.text for tr in >>> soup.html.body.table.table.table.table.find_all("tr")] >>> stock_codes[:10] ['STOCK CODE', '00001', '00002', '00003', '00004', '00005', '00006', '00007', '00008', '00009'] >>> stock_codes[-10:] ['06882', '06886', '06888', '06889', '06893', '06896', '06898', '06899', '80737', '84602'] _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor