On 03/31/2017 05:23 AM, Igor Alexandre wrote: > Hi! > I'm a newbie in the Python/Web crawling world. I've been trying to find an > answer to the following question since last week, but I couldn't so far, so I > decided to ask it myself here: I have a sitemap in XML and I want to use it > to save as text the various pages of the site. Do you guys know how can I do > it? I'm looking for some code on the web where I can just type the xml > address and wait for the crawler to do it's job, saving all the pages > indicated in the sitemap as a text file in my computer. > Thank you! > Best, > Igor Alexandre
There's a surprisingly active community doing web crawling / scraping stuff... I've gotten the impression that the scrapy project is a "big dog" in this space, but I'm not involved in it so not sure. A couple of links for you to play with: http://docs.python-guide.org/en/latest/scenarios/scrape/ the first part of this might be enough for you - lxml + Requests I just had occasion to look over that page a few days ago, but I'm sure a web search would turn that up easily https://scrapy.org/ there are plenty of other resources, someone is bound to have what you're looking for. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor