Michael Joyce created NUTCH-2088: ------------------------------------ Summary: Add Optional Execution to Interactive Selenium Handlers Key: NUTCH-2088 URL: https://issues.apache.org/jira/browse/NUTCH-2088 Project: Nutch Issue Type: Improvement Components: plugin Affects Versions: 1.10 Reporter: Michael Joyce Fix For: 1.11
At the moment, all the Handlers run for every URL when using the interactive-selenium plugin. Often times when trying to do a deep crawl of a site you'll want to handle various subdomains and paths/files differently. You can effectively filter in the handlers at the moment, but only once you've loaded the WebDriver and incurred the associated overhead. It would be much nicer if the handler interface allowed for this check to occur prior to the request to retrieve page content. -- This message was sent by Atlassian JIRA (v6.3.4#6332)