Re: [Tutor] Problem using lxml

Martin A. Brown Sat, 22 Aug 2015 14:28:50 -0700


Hi there Anthony,

I'm pretty new to lxml but I pretty much thought I'd understoodthe basics. However, for some reason, my first attempt at using itis failing miserably.


Here's the deal:

I'm parsing specific page on Craigslist (
http://joplin.craigslist.org/search/rea) and trying to retreive the text of
each link on that page. When I do an "inspect element" in Firefox, a sample
anchor link looks like this:

<a href="/reb/5185592209.html" data-id="5185592209" class="hdrlnk">FIRST
OPEN HOUSE TOMORROW 2:00pm-4:00pm!!! (8-23-15)</a>

The code I'm using to try to get the link text is this:

from lxml import html
import requests

page = requests.get("http://joplin.craigslist.org/search/rea";)

You are missing something here that takes the page.content, parsesit and creates variable called tree.

titles = tree.xpath('//a[@title="hdrlnk"]/text()')

And, your xpath is incorrect. Play with this in the interactivebrowser and you will be able to correct your xpath. I think youwill notice from the example anchor link above that the attribute ofthe <a/> HTML elements you want to grab is "class", not "title".Therefore:


  titles = tree.xpath('//a[@class="hdrlnk"]/text()')

Is probably closer.

print titles

The last line, where it supposedly will print the text of each anchor
returns [].

I can't seem to figure out what I'm doing wrong. lmxml seems pretty
straightforward but I can't seem to get this down.

Again, I'd recommend playing with the data in an interactive consolesession. You will be able to figure out exactly which xpath getsyou the data you would like, and then you can drop it into yourscript.


Good luck,

-Martin

--
Martin A. Brown
http://linux-ip.net/
_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Problem using lxml

Reply via email to