[nodejs] Dynamic content scrape with Node.js

2012-10-06 Thread Narek Musakhanyan
Hey guys . I tried to scrape a data from a website using PHP cURL lib but I failed since cURl allows you to scrape only static content . But the content I want to scrape changes via javascript(AJAX) since cURL cant hanfle that I couldnt handle scraping via cURL . So I heard the this type of t

Re: [nodejs] Dynamic content scrape with Node.js

2012-10-06 Thread Mark Hahn
1) You should consider using the node `request` to scrape instead of cURL. 2) Any scraping is only going to return what you request. This is only going to be the initially provided static content. You are getting this from the server, not the client. There is no way to get anything from the clien

Re: [nodejs] Dynamic content scrape with Node.js

2012-10-06 Thread rektide
Only just picked it up last week, but it worked well enough-- node.io. It exposes a jQuery-esque interface for querying scraped pages. Extremely high level, "just works" scraping module, in my book! It also has a fairly sizable task-processing system built in, which I have not used. Good luck

Re: [nodejs] Dynamic content scrape with Node.js

2012-10-06 Thread Dave Kuhn
Good suggestions so far, though i highly recommend you check out phantomjs.org. Phantom is a headless version of WebKit which is the rendering engine behind Chrome & Safari. It's the most comprehensive solution to handling AJAX content when scraping in my book since it's technically the same as

Re: [nodejs] Dynamic content scrape with Node.js

2012-10-07 Thread Stephan Bardubitzki
Another option would be https://github.com/MatthewMueller/cheerio Tutorial: http://vimeo.com/31950192 On Sat, Oct 6, 2012 at 8:46 PM, Dave Kuhn wrote: > Good suggestions so far, though i highly recommend you check out > phantomjs.org. Phantom is a headless version of WebKit which is the > r

RE: [nodejs] Dynamic content scrape with Node.js

2012-10-08 Thread Chad Engler
t: Re: [nodejs] Dynamic content scrape with Node.js Good suggestions so far, though i highly recommend you check out phantomjs.org. Phantom is a headless version of WebKit which is the rendering engine behind Chrome & Safari. It's the most comprehensive solution to handling AJAX content when

Re: [nodejs] Dynamic content scrape with Node.js

2012-10-09 Thread greelgorke
-by-javascript-on-server-side-from-webpages-aspx#comment17032399_12630891 > > > > -Chad > > > > *From:* nod...@googlegroups.com [mailto: > nod...@googlegroups.com ] *On Behalf Of *Dave Kuhn > *Sent:* Saturday, October 06, 2012 11:46 PM > *To:* nod...@googlegroups.com > *S

Re: [nodejs] Dynamic content scrape with Node.js

2012-10-09 Thread Dave Kuhn
> > From: nod...@googlegroups.com [mailto:nod...@googlegroups.com] On Behalf Of > > Dave Kuhn > > Sent: Saturday, October 06, 2012 11:46 PM > > To: nod...@googlegroups.com > > Subject: Re: [nodejs] Dynamic content scrape with Node.js > > > > Good s