[jira] [Created] (NUTCH-1932) Automatically remove orphaned pages

2015-02-04 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-1932: Summary: Automatically remove orphaned pages Key: NUTCH-1932 URL: https://issues.apache.org/jira/browse/NUTCH-1932 Project: Nutch Issue Type: New Feature

[jira] [Updated] (NUTCH-1932) Automatically remove orphaned pages

2015-02-04 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1932: - Attachment: NUTCH-1932.patch Dirty patch! Automatically remove orphaned pages

[jira] [Updated] (NUTCH-1930) Fetcher erases Markers for certain URLs / documents

2015-02-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1930: Fix Version/s: 2.3.1 Fetcher erases Markers for certain URLs / documents

[jira] [Updated] (NUTCH-1933) nutch-selenium plugin

2015-02-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1933: Attachment: NUTCH-selenium-trunk.patch Patch for trunk nutch-selenium plugin

[jira] [Created] (NUTCH-1934) Refactor Fetcher in trunk

2015-02-04 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1934: --- Summary: Refactor Fetcher in trunk Key: NUTCH-1934 URL: https://issues.apache.org/jira/browse/NUTCH-1934 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-1935) too many open files

2015-02-04 Thread stack (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306165#comment-14306165 ] stack commented on NUTCH-1935: -- What did you have ulimit set to? See 'Limits on Number of

[jira] [Commented] (NUTCH-1935) too many open files

2015-02-04 Thread yuanyun.cn (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306189#comment-14306189 ] yuanyun.cn commented on NUTCH-1935: --- Thanks, stack. The limit is 4096. cat

Re: GSoC 2015

2015-02-04 Thread Julien Nioche
Moving to Hadoop 2.x ? On 4 February 2015 at 14:42, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi Folks, Does anyone have any good ideas for GSoC? Seb mentioned moving Nutch towards Spark so potentially a pluggable runtime execution engine abstraction? I am currently working on

[Nutch Wiki] Trivial Update of FrontPage by LewisJohnMcgibbney

2015-02-04 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The FrontPage page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/FrontPage?action=diffrev1=293rev2=294 * NutchMeetUps - Records of previous Nutch community

[jira] [Updated] (NUTCH-1934) Refactor Fetcher in trunk

2015-02-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1934: Attachment: (was: NUTCH-1934.patch) Refactor Fetcher in trunk

[jira] [Updated] (NUTCH-1934) Refactor Fetcher in trunk

2015-02-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1934: Attachment: NUTCH-1934.patch Patch for trunk. Some early observations: * Existing

[jira] [Updated] (NUTCH-827) HTTP POST Authentication

2015-02-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-827: --- Fix Version/s: (was: 1.11) 1.10 HTTP POST Authentication

[jira] [Updated] (NUTCH-1934) Refactor Fetcher in trunk

2015-02-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1934: Patch Info: Patch Available Refactor Fetcher in trunk -

[jira] [Updated] (NUTCH-1934) Refactor Fetcher in trunk

2015-02-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1934: Attachment: NUTCH-1934.patch Refactor Fetcher in trunk -

[Nutch Wiki] Trivial Update of ContributorsGroup by LewisJohnMcgibbney

2015-02-04 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The ContributorsGroup page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/ContributorsGroup?action=diffrev1=18rev2=19 * ArthurCinader * MaziyarBoustani *

GSoC 2015

2015-02-04 Thread Lewis John Mcgibbney
Hi Folks, Does anyone have any good ideas for GSoC? Seb mentioned moving Nutch towards Spark so potentially a pluggable runtime execution engine abstraction? I am currently working on a lot of security and authentication related work so I would possibly be tempted to overhaul and improve that

[Nutch Wiki] Trivial Update of FrontPage by LewisJohnMcgibbney

2015-02-04 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The FrontPage page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/FrontPage?action=diffrev1=292rev2=293 * NutchMeetUps - Records of previous Nutch community

[Nutch Wiki] Trivial Update of AdvancedAjaxInteraction by LewisJohnMcgibbney

2015-02-04 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The AdvancedAjaxInteraction page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/AdvancedAjaxInteraction New page: = AdvancedAjaxInteraction = This page provides