[jira] [Created] (NUTCH-1932) Automatically remove orphaned pages

2015-02-04 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-1932: Summary: Automatically remove orphaned pages Key: NUTCH-1932 URL: https://issues.apache.org/jira/browse/NUTCH-1932 Project: Nutch Issue Type: New Feature

[jira] [Updated] (NUTCH-1932) Automatically remove orphaned pages

2015-02-04 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1932: - Attachment: NUTCH-1932.patch Dirty patch! > Automatically remove orphaned pages > ---

[jira] [Updated] (NUTCH-1930) Fetcher erases Markers for certain URLs / documents

2015-02-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1930: Fix Version/s: 2.3.1 > Fetcher erases Markers for certain URLs / documents > ---

[jira] [Updated] (NUTCH-1930) Fetcher erases Markers for certain URLs / documents

2015-02-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1930: Fix Version/s: (was: 2.3.1) 2.4 > Fetcher erases Markers for

[jira] [Created] (NUTCH-1933) nutch-selenium plugin

2015-02-04 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1933: --- Summary: nutch-selenium plugin Key: NUTCH-1933 URL: https://issues.apache.org/jira/browse/NUTCH-1933 Project: Nutch Issue Type: Bug C

[jira] [Updated] (NUTCH-1933) nutch-selenium plugin

2015-02-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1933: Attachment: NUTCH-selenium-trunk.patch Patch for trunk > nutch-selenium plugin > --

[jira] [Created] (NUTCH-1934) Refactor Fetcher in trunk

2015-02-04 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1934: --- Summary: Refactor Fetcher in trunk Key: NUTCH-1934 URL: https://issues.apache.org/jira/browse/NUTCH-1934 Project: Nutch Issue Type: Improvement

[jira] [Created] (NUTCH-1935) too many open files

2015-02-04 Thread yuanyun.cn (JIRA)
yuanyun.cn created NUTCH-1935: - Summary: too many open files Key: NUTCH-1935 URL: https://issues.apache.org/jira/browse/NUTCH-1935 Project: Nutch Issue Type: Bug Affects Versions: 2.2

[Nutch Wiki] Trivial Update of "ContributorsGroup" by LewisJohnMcgibbney

2015-02-04 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "ContributorsGroup" page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/ContributorsGroup?action=diff&rev1=18&rev2=19 * ArthurCinader * MaziyarBoustani

[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2015-02-04 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "FrontPage" page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=292&rev2=293 * NutchMeetUps - Records of previous Nutch community

GSoC 2015

2015-02-04 Thread Lewis John Mcgibbney
Hi Folks, Does anyone have any good ideas for GSoC? Seb mentioned moving Nutch towards Spark so potentially a pluggable runtime execution engine abstraction? I am currently working on a lot of security and authentication related work so I would possibly be tempted to overhaul and improve that aspec

[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2015-02-04 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "FrontPage" page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=293&rev2=294 * NutchMeetUps - Records of previous Nutch community

[jira] [Commented] (NUTCH-1935) too many open files

2015-02-04 Thread stack (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306165#comment-14306165 ] stack commented on NUTCH-1935: -- What did you have ulimit set to? See 'Limits on Number of Fi

[jira] [Commented] (NUTCH-1935) too many open files

2015-02-04 Thread yuanyun.cn (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306189#comment-14306189 ] yuanyun.cn commented on NUTCH-1935: --- Thanks, stack. The limit is 4096. cat /proc/17849/l

[jira] [Commented] (NUTCH-1935) too many open files

2015-02-04 Thread stack (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306194#comment-14306194 ] stack commented on NUTCH-1935: -- The hbase refguide says "It is recommended to raise the ulimi

[Nutch Wiki] Trivial Update of "AdvancedAjaxInteraction" by LewisJohnMcgibbney

2015-02-04 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "AdvancedAjaxInteraction" page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/AdvancedAjaxInteraction New page: = AdvancedAjaxInteraction = This page provides

Re: GSoC 2015

2015-02-04 Thread Julien Nioche
Moving to Hadoop 2.x ? On 4 February 2015 at 14:42, Lewis John Mcgibbney wrote: > Hi Folks, > Does anyone have any good ideas for GSoC? > Seb mentioned moving Nutch towards Spark so potentially a pluggable > runtime execution engine abstraction? > I am currently working on a lot of security and

[jira] [Updated] (NUTCH-1934) Refactor Fetcher in trunk

2015-02-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1934: Attachment: NUTCH-1934.patch Patch for trunk. Some early observations: * Existing N

[jira] [Updated] (NUTCH-1934) Refactor Fetcher in trunk

2015-02-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1934: Attachment: (was: NUTCH-1934.patch) > Refactor Fetcher in trunk > --

[jira] [Updated] (NUTCH-1934) Refactor Fetcher in trunk

2015-02-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1934: Patch Info: Patch Available > Refactor Fetcher in trunk > -

[jira] [Updated] (NUTCH-1934) Refactor Fetcher in trunk

2015-02-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1934: Attachment: NUTCH-1934.patch > Refactor Fetcher in trunk > -

[jira] [Updated] (NUTCH-827) HTTP POST Authentication

2015-02-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-827: --- Fix Version/s: (was: 1.11) 1.10 > HTTP POST Authentication > ---