Regarding your first question:
A handler represents a single set of interactions with a page from which
content should be extracted. Once the handler returns, the content of the
page is read out of the body and returned under the original URL along with
the content from all the other handlers that
Hi, all.
I have been experimenting with Selenium and Nutch following the link:
https://github.com/apache/nutch/tree/trunk/src/plugin/protocol-interactiveselenium
I have been able to post a form using my custom handler. But the url
redirected after posting the form doesn't seem to enter the crawld
2 matches
Mail list logo