Send an email to
user-subscr...@nutch.apache.org
and if you want to join the dev mailing list
send email to:
dev-subscr...@nutch.apache.org
Instructions on:
http://nutch.apache.org/mailing_lists.html
Regards
Girish
On Fri, Oct 2, 2015 at 12:09 PM, Disha Punjabi wrote:
>
Hi,
I am trying to set the whitelist property in nutch-site.xml
as below:
robot.rules.whitelist
test.org
Comma separated list of hostnames or IP addresses to ignore
robot rules parsing for.
However, when i see the crawl data, i still see that the files have not been
crawled and they
o ignore
> robot rules parsing for. Use with care and only if you are explicitly
> allowed by the site owner to ignore the site's robots.txt!
>
>
>
>
> On 09/26/2015 08:59 AM, Girish Rao wrote:
>> Hi,
>>
>> I am trying to set the whitelist property
3 matches
Mail list logo