[jira] [Updated] (NUTCH-2091) Increase robustness and crawling versatility of Nutch for the Deep Web

2015-09-28 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-2091: -- Priority: Major (was: Minor) > Increase robustness and crawling versatility of Nutch for the De

[jira] [Commented] (NUTCH-2086) Nutch 1.X Webui

2015-09-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933969#comment-14933969 ] Hudson commented on NUTCH-2086: --- SUCCESS: Integrated in Nutch-trunk #3285 (See [https://bui

[jira] [Commented] (NUTCH-2110) Create the capability to provide seeds in the form of "url+xpath(including option to enter seach terms).selenium"

2015-09-28 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933845#comment-14933845 ] Asitang Mishra commented on NUTCH-2110: --- To keep everything under one single url in

[jira] [Commented] (NUTCH-2086) Nutch 1.X Webui

2015-09-28 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933808#comment-14933808 ] Lewis John McGibbney commented on NUTCH-2086: - Folks this is committed @revisi

[jira] [Created] (NUTCH-2127) Provide the selenium protocol with basic authentication capabilities.

2015-09-28 Thread Asitang Mishra (JIRA)
Asitang Mishra created NUTCH-2127: - Summary: Provide the selenium protocol with basic authentication capabilities. Key: NUTCH-2127 URL: https://issues.apache.org/jira/browse/NUTCH-2127 Project: Nutch

[jira] [Updated] (NUTCH-2110) Create the capability to provide seeds in the form of "url+xpath(including option to enter seach terms).selenium"

2015-09-28 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-2110: -- Description: Create the capability to provide seeds in the form of "url+xpath(including option t

[jira] [Updated] (NUTCH-2126) Use selenium protocol for specific sites

2015-09-28 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-2126: -- Summary: Use selenium protocol for specific sites (was: Use selenium protocol for specific site

[jira] [Created] (NUTCH-2126) Use selenium protocol for specific sites when switched on

2015-09-28 Thread Asitang Mishra (JIRA)
Asitang Mishra created NUTCH-2126: - Summary: Use selenium protocol for specific sites when switched on Key: NUTCH-2126 URL: https://issues.apache.org/jira/browse/NUTCH-2126 Project: Nutch Is

[jira] [Updated] (NUTCH-2108) Add a function to the selenium interactive plugin interface to do multiple manipulation of driver and then return the data

2015-09-28 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-2108: -- Priority: Major (was: Minor) > Add a function to the selenium interactive plugin interface to d

[jira] [Updated] (NUTCH-2124) redirect following same link again and again , max redirect exceed and went db_gone

2015-09-28 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2124: --- Patch Info: Patch Available > redirect following same link again and again , max redirect exce

[jira] [Updated] (NUTCH-2124) redirect following same link again and again , max redirect exceed and went db_gone

2015-09-28 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2124: --- Attachment: NUTCH-2124.patch Somehow the fix for NUTCH-1939 gets lost when Fetcher was refacto

[jira] [Commented] (NUTCH-2108) Add a function to the selenium interactive plugin interface to do multiple manipulation of driver and then return the data

2015-09-28 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933626#comment-14933626 ] ASF GitHub Bot commented on NUTCH-2108: --- GitHub user asitang opened a pull request:

[GitHub] nutch pull request: NUTCH-2108

2015-09-28 Thread asitang
GitHub user asitang opened a pull request: https://github.com/apache/nutch/pull/67 NUTCH-2108 You can merge this pull request into a Git repository by running: $ git pull https://github.com/asitang/nutch NUTCH-2108 Alternatively you can review and apply these changes as the p

[jira] [Commented] (NUTCH-2108) Add a function to the selenium interactive plugin interface to do multiple manipulation of driver and then return the data

2015-09-28 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933605#comment-14933605 ] ASF GitHub Bot commented on NUTCH-2108: --- Github user asitang closed the pull request

[GitHub] nutch pull request: Added support for NUTCH-2108 and NUTCH-2109

2015-09-28 Thread asitang
Github user asitang closed the pull request at: https://github.com/apache/nutch/pull/66 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabl

[jira] [Commented] (NUTCH-2108) Add a function to the selenium interactive plugin interface to do multiple manipulation of driver and then return the data

2015-09-28 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933601#comment-14933601 ] ASF GitHub Bot commented on NUTCH-2108: --- GitHub user asitang opened a pull request:

[GitHub] nutch pull request: Added support for NUTCH-2108 and NUTCH-2109

2015-09-28 Thread asitang
GitHub user asitang opened a pull request: https://github.com/apache/nutch/pull/66 Added support for NUTCH-2108 and NUTCH-2109 You can merge this pull request into a Git repository by running: $ git pull https://github.com/asitang/nutch NUTCH-2091 Alternatively you can review

[jira] [Updated] (NUTCH-2125) Metrics tool for relevancy

2015-09-28 Thread Kim Whitehall (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kim Whitehall updated NUTCH-2125: - Summary: Metrics tool for relevancy (was: Metrics) > Metrics tool for relevancy > ---

[jira] [Updated] (NUTCH-2125) Metrics

2015-09-28 Thread Kim Whitehall (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kim Whitehall updated NUTCH-2125: - Description: Purpose: a metric for determining if the “relevancy” of a crawl after each round and

[jira] [Created] (NUTCH-2125) Metrics

2015-09-28 Thread Kim Whitehall (JIRA)
Kim Whitehall created NUTCH-2125: Summary: Metrics Key: NUTCH-2125 URL: https://issues.apache.org/jira/browse/NUTCH-2125 Project: Nutch Issue Type: Improvement Components: tool

Re: Fetch failed : java.lang.NullPointerException

2015-09-28 Thread Michael Joyce
I don't see any null pointer exceptions coming up in your log. Do you have any more info or perhaps I'm missing something? -- Jimmy On Sun, Sep 27, 2015 at 3:04 PM, mithun wrote: > Hi All > > While crawling my seed list, I bumped into this Null Pointer Exception for > few urls. What could be t

Re: CSCI - 572: Team 18 : Questions

2015-09-28 Thread Michael Joyce
shouldProcessURL simply takes a URL and returns true/false to determine if the handler should process the URL. You can dictate what logic you do in your handler to determine if you want to process a URL or not. You'll note that the simple example in the codebase [1] simply returns true, A.K.A, proc

[jira] [Updated] (NUTCH-2124) redirect following same link again and again , max redirect exceed and went db_gone

2015-09-28 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2124: --- Fix Version/s: 1.11 > redirect following same link again and again , max redirect exceed and w

[jira] [Commented] (NUTCH-2124) redirect following same link again and again , max redirect exceed and went db_gone

2015-09-28 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933369#comment-14933369 ] Sebastian Nagel commented on NUTCH-2124: Confirmed. Thanks! To reproduce with the

[jira] [Updated] (NUTCH-2124) redirect following same link again and again , max redirect exceed and went db_gone

2015-09-28 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2124: --- Priority: Blocker (was: Major) > redirect following same link again and again , max redirect

[jira] [Updated] (NUTCH-2124) redirect following same link again and again , max redirect exceed and went db_gone

2015-09-28 Thread Yogendra Kumar Soni (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yogendra Kumar Soni updated NUTCH-2124: --- Description: Hello, followredirect is not working in trunk. please see the below log.

[jira] [Updated] (NUTCH-2124) redirect following same link again and again , max redirect exceed and went db_gone

2015-09-28 Thread Yogendra Kumar Soni (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yogendra Kumar Soni updated NUTCH-2124: --- Flags: Important Labels: db_gone fetcher redirect (was: ) Descripti

[jira] [Created] (NUTCH-2124) redirect following same link again and again , max redirect exceed and went db_gone

2015-09-28 Thread Yogendra Kumar Soni (JIRA)
Yogendra Kumar Soni created NUTCH-2124: -- Summary: redirect following same link again and again , max redirect exceed and went db_gone Key: NUTCH-2124 URL: https://issues.apache.org/jira/browse/NUTCH-2124