[jira] [Commented] (NUTCH-961) Expose Tika's boilerpipe support

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110373#comment-15110373 ] Markus Jelsma commented on NUTCH-961: - Hello - that doesn't seem related to this issue

[jira] [Commented] (NUTCH-1233) Rely on Tika for outlink extraction

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110375#comment-15110375 ] Markus Jelsma commented on NUTCH-1233: -- Yes, we'll get this support with Tika 1.12. T

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1325: - Attachment: NUTCH-1325.patch Updated patch to use TDigest for streaming percentiles. But because n

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1325: - Attachment: NUTCH-1325.patch TDigest is awesome! Here's with support for user configurable list of

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1325: - Patch Info: Patch Available Description: h1. HostDB for Apache Nutch 1.x * automatically gen

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1325: - Attachment: NUTCH-1325.patch Updated patch for trunk contains more thorough config descriptions an

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1325: - Fix Version/s: 1.12 > HostDB for Nutch > > > Key: NUTCH-1325 >

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1325: - Component/s: hostdb > HostDB for Nutch > > > Key: NUTCH-1325 >

[jira] [Resolved] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1325. -- Resolution: Fixed Committed to trunk in revision 1725952. Many thanks to all contributors! > H

[jira] [Updated] (NUTCH-2201) Remove loops program from webgraph package

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2201: - Attachment: NUTCH-2201.patch Patch for trunk which removed the loops program and all references. C

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110702#comment-15110702 ] Lewis John McGibbney commented on NUTCH-1325: - What a patch. Real nice. I real

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110708#comment-15110708 ] Hudson commented on NUTCH-1325: --- SUCCESS: Integrated in Nutch-trunk #3339 (See [https://bui

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110729#comment-15110729 ] Markus Jelsma commented on NUTCH-1325: -- Yes, they are very useful for finding website

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110733#comment-15110733 ] Lewis John McGibbney commented on NUTCH-1325: - Nice Markus, the conversation i

[jira] [Resolved] (NUTCH-2201) Remove loops program from webgraph package

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2201. -- Resolution: Fixed Committed to trunk revision 1725981. Thanks Dennis! > Remove loops program f

[jira] [Updated] (NUTCH-2201) Remove loops program from webgraph package

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2201: - Patch Info: Patch Available > Remove loops program from webgraph package > ---

[jira] [Commented] (NUTCH-2197) Add solr5 solrcloud indexer support

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110797#comment-15110797 ] Markus Jelsma commented on NUTCH-2197: -- This Solr 5 plugin is capable of indexing to

[jira] [Commented] (NUTCH-2201) Remove loops program from webgraph package

2016-01-21 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110818#comment-15110818 ] Hudson commented on NUTCH-2201: --- SUCCESS: Integrated in Nutch-trunk #3340 (See [https://bui

[RESULT] WAS Re: [VOTE] Release Apache Nutch 2.3.1rc2

2016-01-21 Thread Lewis John Mcgibbney
Hi Folks, I am bringing this VOTE to a close with the following results [3] +1 Release this package as Apache Nutch 2.3.1. Lewis John McGibbney* Sebastian Nagel* Chris Mattmann* [0] -1 Do not release this package becauseā€¦ *Nutch PMC Member I am really happy to therefore announce that the VOTE p

[jira] [Commented] (NUTCH-2202) Integration of Anthelion (Focused Crawling Module) into Nutch

2016-01-21 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110867#comment-15110867 ] Lewis John McGibbney commented on NUTCH-2202: - I agree [~robertmeusel], this w

[jira] [Commented] (NUTCH-2202) Integration of Anthelion (Focused Crawling Module) into Nutch

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110947#comment-15110947 ] Markus Jelsma commented on NUTCH-2202: -- Yes, a patch would be a good place to start.

[ANNOUNCE] Apache Nutch 2.3.1 Release

2016-01-21 Thread lewis john mcgibbney
Hi Folks, !!Apologies for cross posting!! The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v2.3.1, we advise all current users and developers of the 2.X series to upgrade to this release. Nutch is a well matured, production ready Web crawler. Nutch 2.X branch is

[jira] [Commented] (NUTCH-961) Expose Tika's boilerpipe support

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111292#comment-15111292 ] Markus Jelsma commented on NUTCH-961: - Some news, the upstream Tika issue has been comm