[jira] [Created] (NUTCH-2216) ignore.internal.links to optionally follow internal redirects

2016-02-12 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-2216: Summary: ignore.internal.links to optionally follow internal redirects Key: NUTCH-2216 URL: https://issues.apache.org/jira/browse/NUTCH-2216 Project: Nutch

[jira] [Commented] (NUTCH-2216) ignore.internal.links to optionally follow internal redirects

2016-02-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144463#comment-15144463 ] Markus Jelsma commented on NUTCH-2216: -- Apparently db.ignore.internal.links is not implemented in

[jira] [Commented] (NUTCH-2216) ignore.internal.links to optionally follow internal redirects

2016-02-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144497#comment-15144497 ] Markus Jelsma commented on NUTCH-2216: -- Additionally, it probably should not be implemented because

[jira] [Commented] (NUTCH-2216) ignore.internal.links to optionally follow internal redirects

2016-02-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144518#comment-15144518 ] Markus Jelsma commented on NUTCH-2216: -- An option is to change the default for

[jira] [Commented] (NUTCH-2217) Crawl pages with specified language

2016-02-12 Thread Dawid Wolski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144677#comment-15144677 ] Dawid Wolski commented on NUTCH-2217: - The same plugin, but for 2.x version. > Crawl pages with

[jira] [Updated] (NUTCH-2217) Crawl pages with specified language

2016-02-12 Thread Dawid Wolski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Wolski updated NUTCH-2217: External issue URL: (was: https://issues.apache.org/jira/browse/NUTCH-1663) External issue

[jira] [Updated] (NUTCH-2217) Crawl pages with specified language

2016-02-12 Thread Dawid Wolski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Wolski updated NUTCH-2217: Description: Plugin to filter out the pages on languages other than specified. It bases on language

[jira] [Commented] (NUTCH-2217) Crawl pages with specified language

2016-02-12 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145037#comment-15145037 ] ASF GitHub Bot commented on NUTCH-2217: --- GitHub user merito opened a pull request:

[GitHub] nutch pull request: fix for NUTCH-2217 contributed by dawid.wolski

2016-02-12 Thread merito
GitHub user merito opened a pull request: https://github.com/apache/nutch/pull/90 fix for NUTCH-2217 contributed by dawid.wolski You can merge this pull request into a Git repository by running: $ git pull https://github.com/merito/nutch NUTCH-2217 Alternatively you can

[jira] [Updated] (NUTCH-2218) Switch CrawlCompletion arg parsing to Commons CLI

2016-02-12 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce updated NUTCH-2218: - Issue Type: Improvement (was: Bug) > Switch CrawlCompletion arg parsing to Commons CLI >

[jira] [Commented] (NUTCH-2218) Switch CrawlCompletion arg parsing to Commons CLI

2016-02-12 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145533#comment-15145533 ] ASF GitHub Bot commented on NUTCH-2218: --- GitHub user MJJoyce opened a pull request:

[GitHub] nutch pull request: NUTCH-2218 - Update CrawlComplete util with Co...

2016-02-12 Thread MJJoyce
GitHub user MJJoyce opened a pull request: https://github.com/apache/nutch/pull/91 NUTCH-2218 - Update CrawlComplete util with Commons CLI arg parsing - Switch all argument parsing and checking to commons CLI. - Update input directory processing such that the 'crawldb' folder

[GitHub] nutch pull request: NUTCH-2218 - Update CrawlComplete util with Co...

2016-02-12 Thread lewismc
Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/91#discussion_r52822613 --- Diff: src/java/org/apache/nutch/util/CrawlCompletionStats.java --- @@ -61,27 +69,60 @@ private int mode = 0; public int run(String[]

[jira] [Commented] (NUTCH-2218) Switch CrawlCompletion arg parsing to Commons CLI

2016-02-12 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145777#comment-15145777 ] ASF GitHub Bot commented on NUTCH-2218: --- Github user lewismc commented on a diff in the pull