On 09/01/16 02:01, Whom Isac wrote: > Hi I want to create a web-crawler but dont have any lead to choose any > module. I have came across the Jsoup but I am not familiar with how to use > it in 3.5 as I tried looking at a similar web crawler codes from 3.4 dev > version.
I don't know Jsoup and have no idea about how it works with 3.5. However there are some modules in the standard library you can use including htmlib, urllib and so on. Beautiful soup is good at parsing badly constructed html and etree is good for xml/xhtml. Requests is also a good bet for working with http requests. > I just want to build that crawler to crawl through a javascript enable site > and automatically detect a download link (for video file) Depending on what exactly the Javascript does it might not be possible (at least not directly) Many modern sites simply load up the document structure before calling a Javascript function to fetch all the data (including inks and images) from a server via JSON. If that's what your site does you'll need to find the call to the server and emulate it from Python. > And should I be using pickles to write the data in the text file/ save file. You could. You could also use a database such as SQLite. It really depends on what you plan on doing with it after you save it. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor