subject:"Integrating Selenium with Nutch"

Re: Integrating Selenium with Nutch

2015-10-06 Thread Michael Joyce

Regarding your first question: A handler represents a single set of interactions with a page from which content should be extracted. Once the handler returns, the content of the page is read out of the body and returned under the original URL along with the content from all the other handlers that

Integrating Selenium with Nutch

2015-10-02 Thread Taichi Ho

Hi, all. I have been experimenting with Selenium and Nutch following the link: https://github.com/apache/nutch/tree/trunk/src/plugin/protocol-interactiveselenium I have been able to post a form using my custom handler. But the url redirected after posting the form doesn't seem to enter the crawld