Anthony Papillion schrieb am 23.08.2015 um 01:16:
> from lxml import html
> import requests
>
> page = requests.get("http://joplin.craigslist.org/search/w4m")
> tree = html.fromstring(page.text)
While requests has its merits, this can be simplified to
tree = html.parse("http://joplin.craigslist.org/search/w4m")
> titles = tree.xpath('//a[@class="hdrlnk"]/text()')
> try:
> for title in titles:
> print title
This only works as long as the link tags only contain plain text, no other
tags, because "text()" selects individual text nodes in XPath. Also, using
@class="hdrlnk" will not match link tags that use class=" hdrlnk " or
class="abc hdrlnk other".
If you want to be on the safe side, I'd use cssselect instead and then
serialise the complete text content of the link tag to a string, i.e.
from lxml.etree import tostring
for link_element in tree.cssselect("a.hdrlnk"):
title = tostring(
link_element,
method="text", encoding="unicode", with_tail=False)
print(title.strip())
Note that the "cssselect()" feature requires the external "cssselect"
package to be installed. "pip install cssselect" should handle that.
> except:
> pass
Oh, and bare "except:" clauses are generally frowned upon because they can
easily hide bugs by also catching unexpected exceptions. Better be explicit
about the exception type(s) you want to catch.
Stefan
_______________________________________________
Tutor maillist - [email protected]
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor