Nutch - Apache Mentor Project Proposal

2017-10-23 Thread kenneth mcfarland
Greetings Nutch Community! I would to participate in the mentoring program that Apache offers and do it with Nutch. The first thing needed is to pick an issue out. This is *my* responsibility, but I think it would be smart to ask the community first if there are any good issues people think would

[jira] [Commented] (NUTCH-2448) Allow Sending an empty http.agent.version

2017-10-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215804#comment-16215804 ] Markus Jelsma commented on NUTCH-2448: -- Fine, go ahead! > Allow Sending an empty htt

[jira] [Commented] (NUTCH-2448) Allow Sending an empty http.agent.version

2017-10-23 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215754#comment-16215754 ] ASF GitHub Bot commented on NUTCH-2448: --- sebastian-nagel commented on issue #232: NU

[jira] [Commented] (NUTCH-2448) Allow Sending an empty http.agent.version

2017-10-23 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215395#comment-16215395 ] ASF GitHub Bot commented on NUTCH-2448: --- YossiTamari opened a new pull request #232:

[jira] [Commented] (NUTCH-2445) Fetcher following outlinks to keep track of already fetched items

2017-10-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215248#comment-16215248 ] Hudson commented on NUTCH-2445: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3461 (See

[jira] [Created] (NUTCH-2448) Allow Sending an empty http.agent.version

2017-10-23 Thread Yossi Tamari (JIRA)
Yossi Tamari created NUTCH-2448: --- Summary: Allow Sending an empty http.agent.version Key: NUTCH-2448 URL: https://issues.apache.org/jira/browse/NUTCH-2448 Project: Nutch Issue Type: Bug

[jira] [Commented] (NUTCH-1932) Automatically remove orphaned pages

2017-10-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215190#comment-16215190 ] Markus Jelsma commented on NUTCH-1932: -- I agree! I will try to make some time for it

[jira] [Commented] (NUTCH-1932) Automatically remove orphaned pages

2017-10-23 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215174#comment-16215174 ] Sebastian Nagel commented on NUTCH-1932: {quote}i don't disagree{quote} Hi [~mark

[jira] [Closed] (NUTCH-2445) Fetcher following outlinks to keep track of already fetched items

2017-10-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-2445. Thanks Sebastian! > Fetcher following outlinks to keep track of already fetched items > ---

[jira] [Resolved] (NUTCH-2445) Fetcher following outlinks to keep track of already fetched items

2017-10-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2445. -- Resolution: Fixed remote:3c21a6b..0cdd095 0cdd095c881eed52dc461e559ce6ae278e99157f -> maste

[jira] [Commented] (NUTCH-2445) Fetcher following outlinks to keep track of already fetched items

2017-10-23 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215168#comment-16215168 ] Sebastian Nagel commented on NUTCH-2445: +1 > Fetcher following outlinks to keep

[jira] [Commented] (NUTCH-2444) HostDB CSV dumper to emit field header by default

2017-10-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215161#comment-16215161 ] Hudson commented on NUTCH-2444: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3460 (See

[jira] [Resolved] (NUTCH-2444) HostDB CSV dumper to emit field header by default

2017-10-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2444. -- Resolution: Fixed remote:602c663..3c21a6b 3c21a6b2abaa17ecc66a1c76d1239c213c56ba4e -> maste

[jira] [Closed] (NUTCH-2444) HostDB CSV dumper to emit field header by default

2017-10-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-2444. Thanks! > HostDB CSV dumper to emit field header by default > -

[jira] [Updated] (NUTCH-2445) Fetcher following outlinks to keep track of already fetched items

2017-10-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2445: - Attachment: NUTCH-2445.patch Updated patch! > Fetcher following outlinks to keep track of already

[jira] [Updated] (NUTCH-2447) Work-around SSLProtocolException: handshake alert: unrecognized_name

2017-10-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2447: - Attachment: NUTCH-2447.patch Added comment indicating its filthy code with reference to here! > W

[jira] [Commented] (NUTCH-2447) Work-around SSLProtocolException: handshake alert: unrecognized_name

2017-10-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215057#comment-16215057 ] Markus Jelsma commented on NUTCH-2447: -- As a side note, also pay attention to this in

[jira] [Updated] (NUTCH-2447) Work-around SSLProtocolException: handshake alert: unrecognized_name

2017-10-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2447: - Attachment: NUTCH-2447.patch Patch for master! Keep in mind, this only work for protocol-http! >

[jira] [Comment Edited] (NUTCH-2447) Work-around SSLProtocolException: handshake alert: unrecognized_name

2017-10-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215055#comment-16215055 ] Markus Jelsma edited comment on NUTCH-2447 at 10/23/17 12:36 PM: ---

[Nutch Wiki] Update of "NutchTutorial" by SebastianNagel

2017-10-23 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "NutchTutorial" page has been changed by SebastianNagel: https://wiki.apache.org/nutch/NutchTutorial?action=diff&rev1=88&rev2=89 Comment: Add core name (default "nutch") to solr serv

[jira] [Updated] (NUTCH-2447) Work-around SSLProtocolException: handshake alert: unrecognized_name

2017-10-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2447: - Description: Nutch is unable to crawl some websites, regardless of protocol plugin you are using.

[jira] [Updated] (NUTCH-2447) Work-around SSLProtocolException: handshake alert: unrecognized_name

2017-10-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2447: - Description: {code} 2017-10-23 12:43:52,911 INFO api.HttpRobotRulesParser - Couldn't get robots.

[jira] [Created] (NUTCH-2447) Work-around SSLProtocolException: handshake alert: unrecognized_name

2017-10-23 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-2447: Summary: Work-around SSLProtocolException: handshake alert: unrecognized_name Key: NUTCH-2447 URL: https://issues.apache.org/jira/browse/NUTCH-2447 Project: Nutch

[jira] [Commented] (NUTCH-2446) URLFiltersCheck fix

2017-10-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214828#comment-16214828 ] Hudson commented on NUTCH-2446: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3459 (See

[jira] [Commented] (NUTCH-2446) URLFiltersCheck fix

2017-10-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214825#comment-16214825 ] Hudson commented on NUTCH-2446: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1594

[jira] [Commented] (NUTCH-2444) HostDB CSV dumper to emit field header by default

2017-10-23 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214820#comment-16214820 ] Sebastian Nagel commented on NUTCH-2444: +1 > HostDB CSV dumper to emit field hea

[jira] [Commented] (NUTCH-2445) Fetcher following outlinks to keep track of already fetched items

2017-10-23 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214819#comment-16214819 ] Sebastian Nagel commented on NUTCH-2445: +1 Two trivial points: - there is a compi

[jira] [Resolved] (NUTCH-2446) URLFiltersCheck fix

2017-10-23 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2446. Resolution: Fixed Committed to 1.x and 2.x ([72128eb|https://github.com/apache/nutch/commit

[jira] [Closed] (NUTCH-2446) URLFiltersCheck fix

2017-10-23 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel closed NUTCH-2446. -- > URLFiltersCheck fix > --- > > Key: NUTCH-2446 > UR

[jira] [Reopened] (NUTCH-2446) URLFiltersCheck fix

2017-10-23 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reopened NUTCH-2446: Reopen to add fix version... > URLFiltersCheck fix > --- > > Ke

[jira] [Updated] (NUTCH-2446) URLFiltersCheck fix

2017-10-23 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2446: --- Fix Version/s: 1.14 2.4 > URLFiltersCheck fix > --- > >

[jira] [Closed] (NUTCH-2446) URLFiltersCheck fix

2017-10-23 Thread kenneth mcfarland (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kenneth mcfarland closed NUTCH-2446. Resolution: Fixed > URLFiltersCheck fix > --- > > Key: NUTCH

[jira] [Commented] (NUTCH-2446) URLFiltersCheck fix

2017-10-23 Thread kenneth mcfarland (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214792#comment-16214792 ] kenneth mcfarland commented on NUTCH-2446: -- Thank you Sebastian! > URLFiltersChe

[jira] [Commented] (NUTCH-2446) URLFiltersCheck fix

2017-10-23 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214790#comment-16214790 ] ASF GitHub Bot commented on NUTCH-2446: --- sebastian-nagel closed pull request #231: F

[jira] [Commented] (NUTCH-2446) URLFiltersCheck fix

2017-10-23 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214787#comment-16214787 ] Sebastian Nagel commented on NUTCH-2446: +1 Good catch, thanks! Traced back to N

[jira] [Commented] (NUTCH-2446) URLFiltersCheck fix

2017-10-23 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214762#comment-16214762 ] ASF GitHub Bot commented on NUTCH-2446: --- kpm1985 opened a new pull request #231: Fix

[jira] [Created] (NUTCH-2446) URLFiltersCheck fix

2017-10-23 Thread kenneth mcfarland (JIRA)
kenneth mcfarland created NUTCH-2446: Summary: URLFiltersCheck fix Key: NUTCH-2446 URL: https://issues.apache.org/jira/browse/NUTCH-2446 Project: Nutch Issue Type: Bug Environmen