For the situation I only need '/r/', there is no option for I to do that. If user need '/r*/', they should specify -I '/r*/' instead.
Simple patch attached, please consider it. Thanks!! [EMAIL PROTECTED] src]$ diff -u utils.c.orig utils.c --- utils.c.orig Fri May 17 20:05:22 2002 +++ utils.c Thu Jun 12 20:24:21 2003 @@ -696,7 +696,9 @@ else { char *p = *x + ((flags & ALLABS) && (**x == '/')); /* Remove '/' */ - if (frontcmp (p, s)) + /* if *p="c", pass if s is "c" or "c/..." not "ca...". */ + int plen = strlen(p); + if ( (strncmp (p, s, plen) == 0) && (s[plen] == '/' || s[plen] == '\0') ) break; } return *x; [EMAIL PROTECTED] src]$ --- "Aaron S. Hawley" <[EMAIL PROTECTED]> wrote: > oh, i understand your problem. your request seems reasonable. i was > trying to see if anyone had an idea why it seemed to be more of a > "feature" than a "bug". > > On Thu, 12 Jun 2003, wei ye wrote: > > > > > Please take a look this example: > > $ \rm -rf biz.yahoo.com > > $ ls biz.yahoo.com > > $ wget -r --domains=biz.yahoo.com -I /r/ 'http://biz.yahoo.com/r/' > > $ ls biz.yahoo.com/ > > r/ reports/ research/ > > $ > > > > I want only '/r/', but it crawls /r*, which includes /reports/, /research/. > > > > Is it an expected result or a bug? > > > > Thanks alot! > > > > > > --- "Aaron S. Hawley" <[EMAIL PROTECTED]> wrote: > > > above the code segment you submitted (line 765 of init.c) the > > > comment: > > > > > > /* Strip the trailing slashes from directories. */ > > > > > > here are the manual notes on this option: > > > > > > (from "Recursive Accept/Reject Options") > > > > > > `-I list' > > > `--include-directories=list' > > > Specify a comma-separated list of directories you wish to follow when > > > downloading (See section Directory-Based Limits for more details.) > > > Elements of list may contain wildcards. > > > > > > --- and --- > > > > > > (from "Directory-Based Limits") > > > > > > `-I list' > > > `--include list' > > > `include_directories = list' > > > `-I' option accepts a comma-separated list of directories included in > > > the retrieval. Any other directories will simply be ignored. The > > > directories are absolute paths. So, if you wish to download from > > > `http://host/people/bozo/' following only links to bozo's colleagues in > > > the `/people' directory and the bogus scripts in `/cgi-bin', you can > > > specify: > > > > > > wget -I /people,/cgi-bin http://host/people/bozo/ > > > > > > --- > > > > > > On Wed, 11 Jun 2003, wei ye wrote: > > > > > > > I'm trying to crawl url with --include-directories='/r/' > > > > parameter. > > > > > > > > I expect to crawl '/r/*', but wget gives me '/r*'. > > > > > > > > By reading the code, it turns out that cmd_directory_vector() > > > > removed the trailing '/' of include-directories '/r/'. > > > > > > > > It's a minor bug, but I hope it could be fix in next version. > > > > > > > > Thanks! > > > > > > > > static int cmd_directory_vector(...) { > > > > ... > > > > if (len > 1) > > > > { > > > > if ((*t)[len - 1] == '/') > > > > (*t)[len - 1] = '\0'; > > > > } > > > > ... > > > > > > > > } > > > > > > > > ===== > > > > Wei Ye > > -- > Fight for Free Digital Speech > www.digitalspeech.org ===== Wei Ye __________________________________ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com