On Wednesday 29 March 2006 12:05, you wrote:
> we also have to reach consensus on the filtering algorithm. for
> instance, should we simply require that a url passes all the filtering
> rules to allow its download (just like the current -A/R behaviour), or
> should we instead adopt a short circuit algorithm that applies all rules
> in the same order in which they were given in the command line and
> immediately allows the download of an url if it passes the first "allow"
> match? should we also support apache-like deny-from-all and
> allow-from-all policies? and what would be the best syntax to trigger
> the usage of these policies?

I would recommend parsing the filters in the order given, that puts the onus 
on the user to optimize the filters and not you. Another way could possibly 
be all filters by domain, then path, and finally file.

Regardless of how you ultimately decide to order the filters, would it be 
possible to allow for users to specify a short circuit? I'm thinking 
something similar to PF's (http://www.openbsd.org/faq/pf/filter.html#quick) 
quick keyword. Example usage of this would be something like:

Need to mirror a site that uses several domains:

--filter=+domain:example.(net|org|com)

Within that domain several paths. One of those paths, which is four levels 
deep, I know I want everything regardless of it's file name/type/etc. It's 
four levels deep.

--filter=+path,quick:([^/]+/){3}/thefiles

The "quick" keyword is used to skip all other filters, because I've told wget 
that I'm sure I want everything in that path if it matches.

Wget would first evaluate the domain, if it passes evaluate the path and if 
that passes then skip all other filters. Should it fail, wget continues to 
evaluate the rest of the filters.

Another example: I know I want nothing from any site other than example.com

--filter=-domain,quick:^(?!example.com)

That should ignore any domain that doesn't begin with example.com and skip all 
other rules because of the "quick" keyword. This would make processing more 
efficient, since other filters don't have to be evaluated.

Curtis

Reply via email to