Nutch 2.x and solr

2015-03-01 Thread uday bhaskar
Hi I am trying to integrate nutch 2.x with solr and hbase. I am able crawl the URLs and store in hbase. But when I try to index them I get 405 error from solr. It says the admin page does not support POST. I am able to load the admin page from the browser. Kindly advice me. Uday

getting Not implemented by the DistributedFileSystem FileSystem implementation

2015-03-01 Thread yeshwanth kumar
Hi, i am using HDP 2.2 Hadoop 2.6.0.2.2.0.0-2041 HBase 0.98.4.2.2.0.0-2041-hadoop2 i pulled code from nutch 2.x branch changed ivy.settings gora-hbase to 0.5 and made the build when i tried to run the crawl i am getting this exception *./bin/crawl urls/1_crawl http://localhost:8983/s

Re: Nutch with Selenium pops up Firefox window

2015-03-01 Thread Mattmann, Chris A (3980)
Thank you Jay!! Cheers ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm..

Re: [MASSMAIL]Re: [MASSMAIL]How to make Nutch 1.7 request mimic a browser?

2015-03-01 Thread Jorge Luis Betancourt González
The general answer is: it dependes, usually is "polite" to present your robot to the website so the webmaster knows what is accessing the site, this is why google and a lot of other search engines (big and small) use a distinctive name for their crawlers/bots. That being said, the first site tha

Re: [MASSMAIL]Re: Can anyone fetch this page?

2015-03-01 Thread Jorge Luis Betancourt González
Same for me: ➜ local bin/nutch parsechecker http://www.nature.com fetching: http://www.nature.com Fetch failed with protocol status: exception(16), lastModified=0: Http code=500, url=http://www.nature.com ➜ local curl --head http://www.nature.com HTTP/1.0 200 OK ... This is odd! - Origi