[
https://issues.apache.org/jira/browse/NUTCH-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thamme Gowda N updated NUTCH-2251:
--
Summary: Make CommonCrawlFormatJackson instance reusable by properly
handling object state
Thamme Gowda N created NUTCH-2251:
-
Summary: Make CommonCrawlFormatJackson instance reusable for by
properly handling object state when it used to format many documents
Key: NUTCH-2251
URL:
Thamme Gowda N created NUTCH-2250:
-
Summary: CommonCrawlDumper : Invalid format + skipped parts
Key: NUTCH-2250
URL: https://issues.apache.org/jira/browse/NUTCH-2250
Project: Nutch
Issue
[
https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166447#comment-15166447
]
Thamme Gowda N commented on NUTCH-2144:
---
Hi [~wastl-nagel],
Were you able to test this plugin?
I
[
https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141181#comment-15141181
]
Thamme Gowda N commented on NUTCH-2144:
---
+1 sounds great
> Plugin to override db.ignore.external to
[
https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141285#comment-15141285
]
Thamme Gowda N commented on NUTCH-2144:
---
Hi [~lewismc]
* I think relying on URL suffix based
[
https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141359#comment-15141359
]
Thamme Gowda N commented on NUTCH-2144:
---
Thanks.
Yes, I will submit a new patch.
> Plugin to
Thamme Gowda N created NUTCH-2164:
-
Summary: Inconsistent 'Modified Time' in crawl db
Key: NUTCH-2164
URL: https://issues.apache.org/jira/browse/NUTCH-2164
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thamme Gowda N updated NUTCH-2144:
--
Attachment: ignore-exempt.patch
Patch supplied.
Summary of changes:
* A new plugin extension
Thamme Gowda N created NUTCH-2144:
-
Summary: Plugin to override db.ignore.external to exempt
interesting external domain URLs
Key: NUTCH-2144
URL: https://issues.apache.org/jira/browse/NUTCH-2144
[
https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963413#comment-14963413
]
Thamme Gowda N edited comment on NUTCH-2144 at 10/19/15 2:54 PM:
-
Hi,
[
https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thamme Gowda N updated NUTCH-2144:
--
Attachment: ignore-exempt.patch
The patch is made minimal.
> Plugin to override
[
https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964175#comment-14964175
]
Thamme Gowda N edited comment on NUTCH-2144 at 10/19/15 10:26 PM:
--
Thanks
[
https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964175#comment-14964175
]
Thamme Gowda N commented on NUTCH-2144:
---
Thanks for your feedback. I agree that the content-type
14 matches
Mail list logo