I have "index-(basic|anchor|more|metadata)" and
"parse-(html|tika|metatags)" included in plugin.includes, but despite:
# bin/nutch parsechecker https:/. |grep -i date
Date : Tue, 18 Oct 2016 14:37:40 GMT
The 'date' field in Solr for the document is wrong :
|"date":
That's only in nutch-default.xml, and is set to the default which is true.
Good idea though !
Tom
On 17/10/16 17:27, Julien Nioche wrote:
Hi Tom
You haven't modified the value for the config below by any chance?
http.robots.403.allow
true
Some servers return HTTP
Hi Sachin,
Answering both of your questions here as I am catching up with some mail.
On Fri, Sep 30, 2016 at 5:04 AM, wrote:
>
> From: Sachin Shaju
> To: user@nutch.apache.org
> Cc:
> Date: Fri, 30 Sep 2016 10:00:04 +0530
> Subject: Re:
Hi Sachin,
Very late response I know but hopefully better later than never. Response
below
On Fri, Sep 30, 2016 at 5:04 AM, wrote:
>
> From: Sachin Shaju
> To: user@nutch.apache.org
> Cc:
> Date: Thu, 29 Sep 2016 14:01:13 +0530
> Subject:
4 matches
Mail list logo