Michael Joyce created NUTCH-2088:
------------------------------------

             Summary: Add Optional Execution to Interactive Selenium Handlers
                 Key: NUTCH-2088
                 URL: https://issues.apache.org/jira/browse/NUTCH-2088
             Project: Nutch
          Issue Type: Improvement
          Components: plugin
    Affects Versions: 1.10
            Reporter: Michael Joyce
             Fix For: 1.11


At the moment, all the Handlers run for every URL when using the 
interactive-selenium plugin. Often times when trying to do a deep crawl of a 
site you'll want to handle various subdomains and paths/files differently. You 
can effectively filter in the handlers at the moment, but only once you've 
loaded the WebDriver and incurred the associated overhead. It would be much 
nicer if the handler interface allowed for this check to occur prior to the 
request to retrieve page content.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to