Semyon Semyonov created NUTCH-2522:
--------------------------------------

             Summary:  Bidirectional URL exemption filter
                 Key: NUTCH-2522
                 URL: https://issues.apache.org/jira/browse/NUTCH-2522
             Project: Nutch
          Issue Type: Improvement
          Components: plugin
            Reporter: Semyon Semyonov


The current Nutch Url Exemption plugin exempts based on toUrl only, the new 
plugin uses both fromUrl and toUrl and after the regex transformation, exempts 
based on condition regex(fromUrl) == regex(toUrl).

This approach allows us to perform more complex url exemption filter checks, 
such as allow links:
http://[www.website.com/|http://www.website.com/]home -> 
http://[website.com/a|http://www.website.com/]bout ( with/without www).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to