Thanks guys for your replies. I actually tried playing with my browser but getting a web crawler to select a video and fetch the video link was not helpful or should I say very hard for me as I am just a beginner level programmer and python was the first language I learnt. I also learnt javascript, ruby and html, bootstrap, C# recently. I may try this same project in future with more knowledge.
On Sun, Jan 10, 2016 at 2:33 AM, bruce <badoug...@gmail.com> wrote: > Hi Isac. > > I'm not going to get into the pythonic stuff.. People on the list are > way better than I. I've been doing a chunk of crawling, it's not too > bad, depending on what you're trying to accomplish and the site you're > targeting. > > So, no offense, but I'm going to treat you like a 6 year old (google > it - from a movie!) > > You need to back up, and analyze the site/pages/structure you're going > after. Use the tools - firefox - livehttpheaders/nettraffic/etc.. > -you want to be able to see what the exchange is between the > client/browser, as well as the server.. > -often, this gives you the clues/insite to crafting the request from > your client back to the server for the item/data you're going for... > > Once you've gotten that together, setup the basic process with > wget/curl etc to get a feel for any weird issues - cert issues? > -security issues - are cookies required - etc.. A good deal of this > stuff can be resolved/checked out at this level, without jumping into > coding.. > > Once you're comfortable at this point, you can crank out some simple > code to go after the site you're targeting. > > In the event you really have a javascript/dynamic site that you can't > handle in any other manner, you're going to need to go use a 'headless > browser' process. > > There are a number of headless browser projects - I think most run on > the webit codebase (don't quote me). Casper/phantomjs, there are also > pythonic implementations as well... > > So, there you go, should/hopefully this will get you on your way! > > > > On Fri, Jan 8, 2016 at 9:01 PM, Whom Isac <wombing...@gmail.com> wrote: > > Hi I want to create a web-crawler but dont have any lead to choose any > > module. I have came across the Jsoup but I am not familiar with how to > use > > it in 3.5 as I tried looking at a similar web crawler codes from 3.4 dev > > version. > > I just want to build that crawler to crawl through a javascript enable > site > > and automatically detect a download link (for video file) > > . > > And should I be using pickles to write the data in the text file/ save > file. > > Thanks > > _______________________________________________ > > Tutor maillist - Tutor@python.org > > To unsubscribe or change subscription options: > > https://mail.python.org/mailman/listinfo/tutor > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor