Re: Unable to fetch content after integrating selenium

2015-10-06 Thread Michael Joyce
What value do you have set for interactiveselenium.handlers? If you have multiple handlers there they're each going to be called on the URL. So if you have authentication then you're going to need to do it in each handler. -- Jimmy On Sat, Oct 3, 2015 at 5:29 PM, crawl party

Re: WELCOME to dev@nutch.apache.org

2015-10-06 Thread Michael Joyce
Checkout the install guide for protocol-selenium here [1]. It walks you through the install steps nicely. Let us know if you run into problems with the guide so we can update it. [1] https://github.com/apache/nutch/tree/trunk/src/plugin/protocol-selenium#part-1 -- Jimmy On Sat, Oct 3, 2015 at

[jira] [Created] (NUTCH-2133) Transfer Selenium Documentation to WIki

2015-10-06 Thread Michael Joyce (JIRA)
Michael Joyce created NUTCH-2133: Summary: Transfer Selenium Documentation to WIki Key: NUTCH-2133 URL: https://issues.apache.org/jira/browse/NUTCH-2133 Project: Nutch Issue Type:

Re: Integrating Selenium with Nutch

2015-10-06 Thread Michael Joyce
Regarding your first question: A handler represents a single set of interactions with a page from which content should be extracted. Once the handler returns, the content of the page is read out of the body and returned under the original URL along with the content from all the other handlers that

[jira] [Commented] (NUTCH-2129) Track Protocol Status in Crawl Datum

2015-10-06 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945766#comment-14945766 ] Michael Joyce commented on NUTCH-2129: -- Hey folks, updated PR with the metadata approach for HTTP and

[jira] [Commented] (NUTCH-2124) redirect following same link again and again , max redirect exceed and went db_gone

2015-10-06 Thread Yogendra Kumar Soni (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944878#comment-14944878 ] Yogendra Kumar Soni commented on NUTCH-2124: Thanks,Patch is working fine. We can mark is as

[jira] [Comment Edited] (NUTCH-2124) redirect following same link again and again , max redirect exceed and went db_gone

2015-10-06 Thread Yogendra Kumar Soni (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944878#comment-14944878 ] Yogendra Kumar Soni edited comment on NUTCH-2124 at 10/6/15 10:52 AM: --