[jira] [Commented] (NUTCH-2391) Spurious Duplications for MD5

2017-06-09 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045421#comment-16045421 ] ASF GitHub Bot commented on NUTCH-2391: --- sebastian-nagel opened a new pull request #

[jira] [Commented] (NUTCH-2393) 2.x patch for MD5 duplication issue addressed in NUTCH-2391

2017-06-09 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045417#comment-16045417 ] Sebastian Nagel commented on NUTCH-2393: Just to confirm: 2.x is affected. With a

[jira] [Updated] (NUTCH-2391) Spurious Duplications for MD5

2017-06-09 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2391: --- Description: We're seeing some incidence of a large number of documents being marked as dupli

[jira] [Commented] (NUTCH-2393) 2.x patch for MD5 duplication issue addressed in NUTCH-2391

2017-06-09 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045415#comment-16045415 ] Sebastian Nagel commented on NUTCH-2393: Thanks [~kaidul], for taking care of 2.x!

[jira] [Commented] (NUTCH-2393) 2.x patch for MD5 duplication issue addressed in NUTCH-2391

2017-06-09 Thread Kaidul Islam (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045397#comment-16045397 ] Kaidul Islam commented on NUTCH-2393: - Hi [~wastl-nagel], As you approved the change f

[jira] [Commented] (NUTCH-2393) 2.x patch for MD5 duplication issue addressed in NUTCH-2391

2017-06-09 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045394#comment-16045394 ] ASF GitHub Bot commented on NUTCH-2393: --- kaidul opened a new pull request #193: NUTC

[jira] [Updated] (NUTCH-2393) 2.x patch for MD5 duplication issue addressed in NUTCH-2391

2017-06-09 Thread Kaidul Islam (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaidul Islam updated NUTCH-2393: Description: Equivalent patch for 2.x for issue addressed in NUTCH-2391 (was: Equivalent patch for

[jira] [Created] (NUTCH-2393) 2.x patch for MD5 duplication issue addressed in NUTCH-2391

2017-06-09 Thread Kaidul Islam (JIRA)
Kaidul Islam created NUTCH-2393: --- Summary: 2.x patch for MD5 duplication issue addressed in NUTCH-2391 Key: NUTCH-2393 URL: https://issues.apache.org/jira/browse/NUTCH-2393 Project: Nutch Issu

[jira] [Commented] (NUTCH-2382) indexer-hbase Nutch 1.x branch

2017-06-09 Thread Kaidul Islam (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045381#comment-16045381 ] Kaidul Islam commented on NUTCH-2382: - [~jurian], [~lewismc] I've recently moved to Nu

Re: Crawler-Commons 0.8 released

2017-06-09 Thread Chris Mattmann
Great job! From: Julien Nioche Reply-To: "dev@nutch.apache.org" Date: Friday, June 9, 2017 at 2:28 AM To: "crawler-comm...@googlegroups.com" , "bixo-...@yahoogroups.com" , "dev@nutch.apache.org" , "digitalpeb...@googlegroups.com" Subject: Crawler-Commons 0.8 released Apologies f

Crawler-Commons 0.8 released

2017-06-09 Thread Julien Nioche
Apologies for cross-posting The Common-Crawl project is pleased to announce its 0.8 release. *https://github.com/crawler-commons/crawler-commons/releases/tag/crawler-commons-0.8 * If you are wondering what Crawl