[jira] [Commented] (NUTCH-2478) // is not a valid base URL

2017-12-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288403#comment-16288403 ] Sebastian Nagel commented on NUTCH-2478: ??resolve the missing protocol using the current page's

[jira] [Commented] (NUTCH-2478) // is not a valid base URL

2017-12-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288300#comment-16288300 ] Markus Jelsma commented on NUTCH-2478: -- To clarify a bad sentence, i resolve the missing protocol

[jira] [Commented] (NUTCH-2478) // is not a valid base URL

2017-12-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288289#comment-16288289 ] Markus Jelsma commented on NUTCH-2478: -- Yes, this needs a change in the parser plugins. I sought to

[jira] [Commented] (NUTCH-2478) // is not a valid base URL

2017-12-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288280#comment-16288280 ] Sebastian Nagel commented on NUTCH-2478: Confirmed: {noformat} $ cat

[jira] [Commented] (NUTCH-2477) Refactor *Checker classes to use base class for common code

2017-12-12 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287823#comment-16287823 ] Jurian Broertjes commented on NUTCH-2477: - Feedback is welcome > Refactor *Checker classes to use

[jira] [Updated] (NUTCH-2477) Refactor *Checker classes to use base class for common code

2017-12-12 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jurian Broertjes updated NUTCH-2477: External issue URL: https://github.com/apache/nutch/pull/256 > Refactor *Checker classes to

[jira] [Updated] (NUTCH-2478) // is not a valid base URL

2017-12-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2478: - Description: This test fails: {code} @Test public void testBadResolver() throws Exception {

[jira] [Created] (NUTCH-2477) Refactor *Checker classes to use base class for common code

2017-12-12 Thread Jurian Broertjes (JIRA)
Jurian Broertjes created NUTCH-2477: --- Summary: Refactor *Checker classes to use base class for common code Key: NUTCH-2477 URL: https://issues.apache.org/jira/browse/NUTCH-2477 Project: Nutch

[jira] [Created] (NUTCH-2478) // is not a valid base URL

2017-12-12 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-2478: Summary: // is not a valid base URL Key: NUTCH-2478 URL: https://issues.apache.org/jira/browse/NUTCH-2478 Project: Nutch Issue Type: Bug Affects

RE: [DISCUSS] Release 1.14?

2017-12-12 Thread Markus Jelsma
Happy to hear. There are major improvements in Tika 1.17, it deals much better with some of the more extravagant web pages you find on the web. -Original message- > From:Sebastian Nagel > Sent: Tuesday 12th December 2017 13:36 > To: dev@nutch.apache.org >

Re: [DISCUSS] Release 1.14?

2017-12-12 Thread Sebastian Nagel
Hi Julien, yes, I know there's an open issue by Markus which depends on Tika 1.7. If the Tika release happens this week, I'll make sure that it's included. Thanks, Sebastian On 12/11/2017 10:22 AM, Julien Nioche wrote: > Tika 1.17 will be released shortly, maybe it would be worth waiting a bit

[jira] [Commented] (NUTCH-2473) Elasticsearch REST Indexer broken due to wrong depenency

2017-12-12 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287338#comment-16287338 ] ASF GitHub Bot commented on NUTCH-2473: --- mfeltscher commented on issue #253: fix for NUTCH-2473