Anthony Papillion schrieb am 23.08.2015 um 01:16: > from lxml import html > import requests > > page = requests.get("http://joplin.craigslist.org/search/w4m") > tree = html.fromstring(page.text)
While requests has its merits, this can be simplified to tree = html.parse("http://joplin.craigslist.org/search/w4m") > titles = tree.xpath('//a[@class="hdrlnk"]/text()') > try: > for title in titles: > print title This only works as long as the link tags only contain plain text, no other tags, because "text()" selects individual text nodes in XPath. Also, using @class="hdrlnk" will not match link tags that use class=" hdrlnk " or class="abc hdrlnk other". If you want to be on the safe side, I'd use cssselect instead and then serialise the complete text content of the link tag to a string, i.e. from lxml.etree import tostring for link_element in tree.cssselect("a.hdrlnk"): title = tostring( link_element, method="text", encoding="unicode", with_tail=False) print(title.strip()) Note that the "cssselect()" feature requires the external "cssselect" package to be installed. "pip install cssselect" should handle that. > except: > pass Oh, and bare "except:" clauses are generally frowned upon because they can easily hide bugs by also catching unexpected exceptions. Better be explicit about the exception type(s) you want to catch. Stefan _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor