dear All, I am using nutch for crawling all the user reviews on a page of IMDB .the url will be http://www.imdb.com/title/tt1375666/usercomments http://www.imdb.com/title/tt1375666/usercomments?start=50 I want to crawl all these with only user review as text.
on each of thes url there will be link to user profile like of each user on clicking you will redirect to url like avoiding other urls http://www.imdb.com/user/ur10583368/comments which has all the movie review written by a user in this case ur10583368 but this user could have written multiple reviews and the pattern for those urls will be http://www.imdb.com/user/ur10583368/comments?order=date&start=10 while highlighted area will change for each page Now I need all these reviews as well . please help. i just want to crawl only these url -- Nitin Kumar Hardeniya M.Tech Computational Linguistics IIIT Hyderabad -- Nitin Kumar Hardeniya M.Tech Computational Linguistics IIIT Hyderabad

