[Nutch Wiki] Update of WhiteListRobots by ChrisMattmann

2015-04-18 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The WhiteListRobots page has been changed by ChrisMattmann: https://wiki.apache.org/nutch/WhiteListRobots?action=diffrev1=3rev2=4 Comment: - documentation update From your

[jira] [Updated] (NUTCH-1989) Handling invalid URLs in CommonCrawlDataDumper

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-1989: - Affects Version/s: (was: 1.10) Handling invalid URLs in CommonCrawlDataDumper

[jira] [Resolved] (NUTCH-1989) Handling invalid URLs in CommonCrawlDataDumper

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-1989. -- Resolution: Fixed Committed thanks [~totaro]! {noformat}

[jira] [Updated] (NUTCH-1989) Handling invalid URLs in CommonCrawlDataDumper

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-1989: - Fix Version/s: 1.10 Handling invalid URLs in CommonCrawlDataDumper

[jira] [Updated] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-1854: - Assignee: Sebastian Nagel (was: Lewis John McGibbney) ./bin/crawl fails with a parsing

[jira] [Commented] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14501463#comment-14501463 ] Chris A. Mattmann commented on NUTCH-1854: -- awesome work [~asitang] -

[Nutch Wiki] Update of WhiteListRobots by ChrisMattmann

2015-04-18 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The WhiteListRobots page has been changed by ChrisMattmann: https://wiki.apache.org/nutch/WhiteListRobots?action=diffrev1=4rev2=5 == Build the Nutch runtime and execute RobotRulesParser

[jira] [Commented] (NUTCH-1987) Make bin/crawl indexer agnostic

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14501455#comment-14501455 ] Chris A. Mattmann commented on NUTCH-1987: -- hey Mike can you update per Seb's

[jira] [Updated] (NUTCH-1989) Handling invalid URLs in CommonCrawlDataDumper

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-1989: - Labels: memex (was: ) Handling invalid URLs in CommonCrawlDataDumper

[jira] [Work started] (NUTCH-1989) Handling invalid URLs in CommonCrawlDataDumper

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1989 started by Chris A. Mattmann. Handling invalid URLs in CommonCrawlDataDumper

[jira] [Assigned] (NUTCH-1989) Handling invalid URLs in CommonCrawlDataDumper

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-1989: Assignee: Chris A. Mattmann Handling invalid URLs in CommonCrawlDataDumper

[jira] [Resolved] (NUTCH-1927) Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-1927. -- Resolution: Fixed opened up NUTCH-1992 for 2.x, can close this out now. Thanks Seb!

[jira] [Created] (NUTCH-1992) Port whitelist from NUTCH-1927 to 2.x

2015-04-18 Thread Chris A. Mattmann (JIRA)
Chris A. Mattmann created NUTCH-1992: Summary: Port whitelist from NUTCH-1927 to 2.x Key: NUTCH-1992 URL: https://issues.apache.org/jira/browse/NUTCH-1992 Project: Nutch Issue Type: New

[jira] [Commented] (NUTCH-1989) Handling invalid URLs in CommonCrawlDataDumper

2015-04-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14501465#comment-14501465 ] Hudson commented on NUTCH-1989: --- SUCCESS: Integrated in Nutch-trunk #3069 (See

[jira] [Updated] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-1854: - Labels: memex (was: ) ./bin/crawl fails with a parsing fetcher

[jira] [Updated] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-1854: - Fix Version/s: (was: 1.11) 1.10 ./bin/crawl fails with a parsing

[jira] [Commented] (NUTCH-1927) Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14501482#comment-14501482 ] Chris A. Mattmann commented on NUTCH-1927: -- Updated the documentation page for

[jira] [Commented] (NUTCH-1992) Port whitelist from NUTCH-1927 to 2.x

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14501484#comment-14501484 ] Chris A. Mattmann commented on NUTCH-1992: -- If there are any differences in

[jira] [Resolved] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2015-04-18 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1854. Resolution: Fixed Committed to trunk, r1674581. Thanks! ./bin/crawl fails with a parsing

[jira] [Commented] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2015-04-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14501580#comment-14501580 ] Hudson commented on NUTCH-1854: --- SUCCESS: Integrated in Nutch-trunk #3070 (See

[jira] [Commented] (NUTCH-1987) Make bin/crawl indexer agnostic

2015-04-18 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14501674#comment-14501674 ] Michael Joyce commented on NUTCH-1987: -- Hey Chris, Will do. I'll try to take a poke