[jira] [Commented] (NUTCH-2439) Upgrade to Apache Tika 1.16

2017-10-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208332#comment-16208332 ] Markus Jelsma commented on NUTCH-2439: -- No idea, but probably someone on Tika's user list will so i

[jira] [Commented] (NUTCH-2439) Upgrade to Apache Tika 1.16

2017-10-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208309#comment-16208309 ] Sebastian Nagel commented on NUTCH-2439: +1 Tika-core 1.16 already slept into as dependency of

[jira] [Commented] (NUTCH-2443) Extract links from the video tag with the parse-html plugin

2017-10-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208251#comment-16208251 ] Sebastian Nagel commented on NUTCH-2443: +1 Good catch. There are actually a few more links

[jira] [Updated] (NUTCH-2411) Index-metadata to support indexing multiple values for a field

2017-10-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2411: - Attachment: NUTCH-2411.patch Don't add empty fields. > Index-metadata to support indexing

[jira] [Commented] (NUTCH-2443) Extract links from the video tag with the parse-html plugin

2017-10-17 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16207778#comment-16207778 ] ASF GitHub Bot commented on NUTCH-2443: --- jorgelbg opened a new pull request #230: NUTCH-2443 add

[jira] [Created] (NUTCH-2443) Extract links from the video tag with the parse-html plugin

2017-10-17 Thread Jorge Luis Betancourt Gonzalez (JIRA)
Jorge Luis Betancourt Gonzalez created NUTCH-2443: - Summary: Extract links from the video tag with the parse-html plugin Key: NUTCH-2443 URL: https://issues.apache.org/jira/browse/NUTCH-2443

[jira] [Commented] (NUTCH-2442) Injector to stop if job fails to avoid loss of CrawlDb

2017-10-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16207219#comment-16207219 ] Sebastian Nagel commented on NUTCH-2442: Actually, it's a couple of jobs based on the new

[jira] [Created] (NUTCH-2442) Injector to stop if job fails to avoid loss of CrawlDb

2017-10-17 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2442: -- Summary: Injector to stop if job fails to avoid loss of CrawlDb Key: NUTCH-2442 URL: https://issues.apache.org/jira/browse/NUTCH-2442 Project: Nutch