[jira] [Reopened] (NUTCH-2058) Indexer plugin that allows RegEx replacements on the NutchDocument field values

2015-11-05 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reopened NUTCH-2058: -- Reopening due to failing unit tests: - --- -

[jira] [Commented] (NUTCH-2064) URLNormalizer basic to encode reserved chars and decode non-reserved chars

2015-11-05 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991421#comment-14991421 ] Markus Jelsma commented on NUTCH-2064: -- It looks good to me, there are no immediate issues that come

[jira] [Created] (NUTCH-2161) Interrupted failed and/or killed tasks fail to clean up temp directories in HDFS

2015-11-05 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2161: --- Summary: Interrupted failed and/or killed tasks fail to clean up temp directories in HDFS Key: NUTCH-2161 URL: https://issues.apache.org/jira/browse/NUTCH-2161

[jira] [Updated] (NUTCH-2162) Nutch Webapp Crawl fails as it tries to index

2015-11-05 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2162: Attachment: nutch_webapp.log Example log output from initiating a Crawl from the

[jira] [Created] (NUTCH-2162) Nutch Webapp Crawl fails as it tries to index

2015-11-05 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2162: --- Summary: Nutch Webapp Crawl fails as it tries to index Key: NUTCH-2162 URL: https://issues.apache.org/jira/browse/NUTCH-2162 Project: Nutch