Re: Subscription to nutch list

2015-10-02 Thread Girish Rao
Send an email to user-subscr...@nutch.apache.org and if you want to join the dev mailing list send email to: dev-subscr...@nutch.apache.org Instructions on: http://nutch.apache.org/mailing_lists.html Regards Girish On Fri, Oct 2, 2015 at 12:09 PM, Disha Punjabi wrote: >

Regarding whitelist for robots.txt

2015-09-26 Thread Girish Rao
Hi, I am trying to set the whitelist property in nutch-site.xml as below: robot.rules.whitelist test.org Comma separated list of hostnames or IP addresses to ignore robot rules parsing for. However, when i see the crawl data, i still see that the files have not been crawled and they

Re: Regarding whitelist for robots.txt

2015-09-26 Thread Girish Rao
o ignore > robot rules parsing for. Use with care and only if you are explicitly > allowed by the site owner to ignore the site's robots.txt! > > > > > On 09/26/2015 08:59 AM, Girish Rao wrote: >> Hi, >> >> I am trying to set the whitelist property