Re: [Tutor] Using an XML file for web crawling

Mats Wichmann Fri, 31 Mar 2017 11:22:04 -0700

On 03/31/2017 05:23 AM, Igor Alexandre wrote:
> Hi!
> I'm a newbie in the Python/Web crawling world. I've been trying to find an 
> answer to the following question since last week, but I couldn't so far, so I 
> decided to ask it myself here: I have a sitemap in XML and I want to use it 
> to save as text the various pages of the site. Do you guys know how can I do 
> it? I'm looking for some code on the web where I can just type the xml 
> address and wait for the crawler to do it's job, saving all the pages 
> indicated in the sitemap as a text file in my computer. 
> Thank you!
> Best,
> Igor Alexandre


There's a surprisingly active community doing web crawling / scraping
stuff... I've gotten the impression that the scrapy project is a "big
dog" in this space, but I'm not involved in it so not sure.  A couple of
links for you to play with:

http://docs.python-guide.org/en/latest/scenarios/scrape/

the first part of this might be enough for you - lxml + Requests
I just had occasion to look over that page a few days ago, but I'm sure
a web search would turn that up easily

https://scrapy.org/

there are plenty of other resources, someone is bound to have what
you're looking for.


_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Using an XML file for web crawling

Reply via email to