Date missing from Solr, even though in HTTP last-modified

2016-10-18 Thread Tom Chiverton
I have "index-(basic|anchor|more|metadata)" and "parse-(html|tika|metatags)" included in plugin.includes, but despite: # bin/nutch parsechecker https:/. |grep -i date Date : Tue, 18 Oct 2016 14:37:40 GMT The 'date' field in Solr for the document is wrong : |"date":

Re: Trouble fetch PDFs to pass to Tika (I think)

2016-10-18 Thread Tom Chiverton
That's only in nutch-default.xml, and is set to the default which is true. Good idea though ! Tom On 17/10/16 17:27, Julien Nioche wrote: Hi Tom You haven't modified the value for the config below by any chance? http.robots.403.allow true Some servers return HTTP

Re: Nutch in production

2016-10-18 Thread lewis john mcgibbney
Hi Sachin, Answering both of your questions here as I am catching up with some mail. On Fri, Sep 30, 2016 at 5:04 AM, wrote: > > From: Sachin Shaju > To: user@nutch.apache.org > Cc: > Date: Fri, 30 Sep 2016 10:00:04 +0530 > Subject: Re:

Re: How to run nutch server on distributed environment

2016-10-18 Thread lewis john mcgibbney
Hi Sachin, Very late response I know but hopefully better later than never. Response below On Fri, Sep 30, 2016 at 5:04 AM, wrote: > > From: Sachin Shaju > To: user@nutch.apache.org > Cc: > Date: Thu, 29 Sep 2016 14:01:13 +0530 > Subject: