[jira] [Commented] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2018-08-01 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565805#comment-16565805 ] Hudson commented on NUTCH-: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1613

[jira] [Resolved] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2018-08-01 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-. - Resolution: Fixed Thank you [~alaffet] and everyone else for attempting to fix.

[jira] [Created] (NUTCH-2631) KafkaIndexWriter

2018-08-01 Thread Ayal Ciobotaru (JIRA)
Ayal Ciobotaru created NUTCH-2631: - Summary: KafkaIndexWriter Key: NUTCH-2631 URL: https://issues.apache.org/jira/browse/NUTCH-2631 Project: Nutch Issue Type: Improvement Components

[jira] [Commented] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2018-08-01 Thread Anas Laffet (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565366#comment-16565366 ] Anas Laffet commented on NUTCH-: [~lewismc] sure! [^NUTCH-.patch] > re-fetch

[jira] [Updated] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2018-08-01 Thread Anas Laffet (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anas Laffet updated NUTCH-: --- Attachment: NUTCH-.patch > re-fetch deletes all metadata except _csh_ and _rs_ > ---

[jira] [Created] (NUTCH-2630) Fetcher to log skipped records by robots.txt

2018-08-01 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-2630: Summary: Fetcher to log skipped records by robots.txt Key: NUTCH-2630 URL: https://issues.apache.org/jira/browse/NUTCH-2630 Project: Nutch Issue Type: Improv

RE: [VOTE] Release Apache Nutch 1.15 RC#1

2018-08-01 Thread Markus Jelsma
However, the test crawl ran/runs fine, in the background, no errors. But just now, watching the fetcher, i noticed the crawl delay is not always respected. The only configuration change i have is the http.agent.* directives to run. 2018-08-01 11:47:41,256 INFO  fetcher.FetcherThread - FetcherThr

RE: [VOTE] Release Apache Nutch 1.15 RC#1

2018-08-01 Thread Markus Jelsma
All tests pass, crawler run fine so far, +1 for 1.15! Regards, Markus -Original message- > From:Sebastian Nagel > Sent: Thursday 26th July 2018 17:05 > To: u...@nutch.apache.org > Cc: dev@nutch.apache.org > Subject: [VOTE] Release Apache Nutch 1.15 RC#1 > > Hi Folks, > > A first ca