"Andrew M. Bishop" wrote: > > When I code I use the principle of Keeping It Simple. This doesn't > always mean that the shortest and fastest piece of code is always the > best. There are times that taking a black box approach will win even > though the code is longer and slower. If you need to change every > place where the wildcard pattern is used then perhaps it is not simple > enough.
Yes, I appreciate your KISS coding style. I'm not quite sure what you mean by a
"black box approach", though.
Nevertheless, I think the wildcard pattern matching function can be improved.
I've also made some optimizations by taking the "SplitHostPort" and
"RejoinHostPort" operations outside of loops. Because I have I a rather long
DonGet section, I felt it was worthwhile.
> I have had a look at my implementation and made a few optimisations.
> In particular I have removed the memory allocation which will increase
> the speed.
It sounds like you've made some worthwhile improvements. But I'll bet my pattern
matching function will still be more general, more compact and a bit faster.
> I personally can't think of a good reason or example URL that would
> require more than 2 '*' to define it except for very weird cases (for
> example "*.*.*.foo.com" to exclude all hostnames with 5 parts to them
> but not those with 4 parts which "*.*.foo.com" also matches).
>
> The 2 '*' are limited to each of the host part and path part of the
> URL-SPECIFICATION, you can have a total of more than 2 '*' in the URL
> in total.
I agree that two *s or less are probably sufficient for 98% of the cases.
Nevertheless, I have on several occasions in the past tried to use a
URL-specification where the path part contained more than two *s. I was annoyed
to discover the patterns never matched. They do now. The two '*' limit was
simply an unnecessary restriction.
> Does the patch work with version 2.7-beta? There has been a lot of
> change of the configuration file reading, especially the UrlSpec data
> type.
I haven't got around to studying the 2.7-beta code in detail yet. But I see no
reason why the changes I've made to the 2.6d version can't be carried over to
version 2.7.
> I for one am interested to see the algorithm.
I've included a patch file for version 2.6d as an attachment. I've also listed
the code of my version of the WildcardMatch function, which was the starting
point for the changes I made, in plain text at the end of this message.
I'm only beginning to study the 2.7-beta code, but I will start merging the
changes I've made to the 2.6d code into the 2.7-beta code soon. But I'm still a
bit apprehensive about actually running beta code. (Sorry).
--
Paul A. Rombouts <[EMAIL PROTECTED]>
Vincent van Goghlaan 27
5246 GA Rosmalen
Netherlands
typedef struct _Pattern
{
unsigned short int negated; /*+ Set to true if this is a negated pattern +*/
unsigned short int nstr; /*+ Number of strings in pattern seperated by
wildcards +*/
char pattstrs[0]; /*+ A sequence of null terminated strings to be
matched +*/
}
Pattern;
int WildcardMatch(const char *string, const Pattern *pattern)
{
int i, lenpattstr, npattstrs=pattern->nstr;
const char *pattstr= pattern->pattstrs, *midstr, *endstr;
if(npattstrs == 0) return 1; /* empty pattern matches anything */
if(npattstrs == 1) return(!strcmp(string,pattstr));
lenpattstr = strlen(pattstr);
if(strncmp(string,pattstr,lenpattstr)) return 0;
midstr=string+lenpattstr;
pattstr+=lenpattstr+1;
for(i=0;i<npattstrs-2;++i) {
char *q= strstr(midstr,pattstr);
if(!q) return 0;
lenpattstr = strlen(pattstr);
midstr=q+lenpattstr;
pattstr+=lenpattstr+1;
}
endstr= strchr(midstr,0)-strlen(pattstr);
if(midstr>endstr) return 0;
return(!strcmp(endstr,pattstr));
}
wwwoffle-2.6d-wildcardmatch-patch.gz
Description: GNU Zip compressed data
