[jira] [Updated] (NUTCH-2251) Make CommonCrawlFormatJackson instance reusable by properly handling object state

2016-04-14 Thread Thamme Gowda N (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thamme Gowda N updated NUTCH-2251: -- Summary: Make CommonCrawlFormatJackson instance reusable by properly handling object state

[jira] [Created] (NUTCH-2251) Make CommonCrawlFormatJackson instance reusable for by properly handling object state when it used to format many documents

2016-04-14 Thread Thamme Gowda N (JIRA)
Thamme Gowda N created NUTCH-2251: - Summary: Make CommonCrawlFormatJackson instance reusable for by properly handling object state when it used to format many documents Key: NUTCH-2251 URL:

[jira] [Created] (NUTCH-2250) CommonCrawlDumper : Invalid format + skipped parts

2016-04-14 Thread Thamme Gowda N (JIRA)
Thamme Gowda N created NUTCH-2250: - Summary: CommonCrawlDumper : Invalid format + skipped parts Key: NUTCH-2250 URL: https://issues.apache.org/jira/browse/NUTCH-2250 Project: Nutch Issue

[jira] [Commented] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2016-02-24 Thread Thamme Gowda N (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166447#comment-15166447 ] Thamme Gowda N commented on NUTCH-2144: --- Hi [~wastl-nagel], Were you able to test this plugin? I

[jira] [Commented] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2016-02-10 Thread Thamme Gowda N (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141181#comment-15141181 ] Thamme Gowda N commented on NUTCH-2144: --- +1 sounds great > Plugin to override db.ignore.external to

[jira] [Commented] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2016-02-10 Thread Thamme Gowda N (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141285#comment-15141285 ] Thamme Gowda N commented on NUTCH-2144: --- Hi [~lewismc] * I think relying on URL suffix based

[jira] [Commented] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2016-02-10 Thread Thamme Gowda N (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141359#comment-15141359 ] Thamme Gowda N commented on NUTCH-2144: --- Thanks. Yes, I will submit a new patch. > Plugin to

[jira] [Created] (NUTCH-2164) Inconsistent 'Modified Time' in crawl db

2015-11-09 Thread Thamme Gowda N (JIRA)
Thamme Gowda N created NUTCH-2164: - Summary: Inconsistent 'Modified Time' in crawl db Key: NUTCH-2164 URL: https://issues.apache.org/jira/browse/NUTCH-2164 Project: Nutch Issue Type:

[jira] [Updated] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2015-10-19 Thread Thamme Gowda N (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thamme Gowda N updated NUTCH-2144: -- Attachment: ignore-exempt.patch Patch supplied. Summary of changes: * A new plugin extension

[jira] [Created] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2015-10-19 Thread Thamme Gowda N (JIRA)
Thamme Gowda N created NUTCH-2144: - Summary: Plugin to override db.ignore.external to exempt interesting external domain URLs Key: NUTCH-2144 URL: https://issues.apache.org/jira/browse/NUTCH-2144

[jira] [Comment Edited] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2015-10-19 Thread Thamme Gowda N (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963413#comment-14963413 ] Thamme Gowda N edited comment on NUTCH-2144 at 10/19/15 2:54 PM: - Hi,

[jira] [Updated] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2015-10-19 Thread Thamme Gowda N (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thamme Gowda N updated NUTCH-2144: -- Attachment: ignore-exempt.patch The patch is made minimal. > Plugin to override

[jira] [Comment Edited] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2015-10-19 Thread Thamme Gowda N (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964175#comment-14964175 ] Thamme Gowda N edited comment on NUTCH-2144 at 10/19/15 10:26 PM: -- Thanks

[jira] [Commented] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2015-10-19 Thread Thamme Gowda N (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964175#comment-14964175 ] Thamme Gowda N commented on NUTCH-2144: --- Thanks for your feedback. I agree that the content-type