Soggard opened a new pull request, #149: URL: https://github.com/apache/manifoldcf/pull/149
The "Force the inclusion of redirection” options allows you to include hosts redirected from original seeds. You might want to use this option if the site you are crawling is subject to redirections. Note that it is not required if the previous option is not checked. Here are the possible behaviors: - If the user checks the “Include only hosts”, but not the “Force the inclusion” option, then the redirected files will be filtered if their new URL doesn’t match the seed. - If the user checks the Include only hosts, and checks the Force the inclusion option, then when the job finds a url that is not in the same domain, it is dropped EXCEPT if the url is originated by a 301 or 302 redirection in the document queue. - If the user does NOT check the include only hosts, but checks the Force the inclusion option, then the job will crawl any url found, even if it is originated by a 301 or 302 redirection. - If the user does not check anything, then the behavior is the same as the previous case. If the admin checks the second option AND if the first option is checked, then the job will check any host added in the Set. If a host is subject to redirection, then we add the destination URL in the Set. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@manifoldcf.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org