Re: trailing '/' of include-directories removed bug

wei ye Thu, 12 Jun 2003 22:46:59 -0700

For the situation I only need '/r/', there is no option for I to do that.

If user need '/r*/', they should specify -I '/r*/' instead.


Simple patch attached, please consider it. Thanks!!

[EMAIL PROTECTED] src]$ diff  -u utils.c.orig utils.c
--- utils.c.orig        Fri May 17 20:05:22 2002
+++ utils.c     Thu Jun 12 20:24:21 2003
@@ -696,7 +696,9 @@
     else
       {
        char *p = *x + ((flags & ALLABS) && (**x == '/')); /* Remove '/' */
-       if (frontcmp (p, s))
+       /* if *p="c", pass if s is "c" or "c/..." not "ca...". */
+       int plen = strlen(p);
+       if ( (strncmp (p, s, plen) == 0) && (s[plen] == '/' || s[plen] == '\0')
)
          break;
       }
   return *x;
[EMAIL PROTECTED] src]$ 


--- "Aaron S. Hawley" <[EMAIL PROTECTED]> wrote:
> oh, i understand your problem.  your request seems reasonable.  i was
> trying to see if anyone had an idea why it seemed to be more of a
> "feature" than a "bug".
> 
> On Thu, 12 Jun 2003, wei ye wrote:
> 
> >
> > Please take a look this example:
> > $ \rm -rf biz.yahoo.com
> > $ ls biz.yahoo.com
> > $ wget -r  --domains=biz.yahoo.com -I /r/ 'http://biz.yahoo.com/r/'
> > $ ls biz.yahoo.com/
> > r/              reports/        research/
> > $
> >
> > I want only '/r/', but it crawls /r*, which includes /reports/, /research/.
> >
> > Is it an expected result or a bug?
> >
> > Thanks alot!
> >
> >
> > --- "Aaron S. Hawley" <[EMAIL PROTECTED]> wrote:
> > > above the code segment you submitted (line 765 of init.c) the
> > > comment:
> > >
> > > /* Strip the trailing slashes from directories.  */
> > >
> > > here are the manual notes on this option:
> > >
> > > (from "Recursive Accept/Reject Options")
> > >
> > > `-I list'
> > > `--include-directories=list'
> > >     Specify a comma-separated list of directories you wish to follow when
> > > downloading (See section Directory-Based Limits for more details.)
> > > Elements of list may contain wildcards.
> > >
> > >  --- and ---
> > >
> > > (from "Directory-Based Limits")
> > >
> > > `-I list'
> > > `--include list'
> > > `include_directories = list'
> > >     `-I' option accepts a comma-separated list of directories included in
> > > the retrieval. Any other directories will simply be ignored. The
> > > directories are absolute paths. So, if you wish to download from
> > > `http://host/people/bozo/' following only links to bozo's colleagues in
> > > the `/people' directory and the bogus scripts in `/cgi-bin', you can
> > > specify:
> > >
> > > wget -I /people,/cgi-bin http://host/people/bozo/
> > >
> > > ---
> > >
> > > On Wed, 11 Jun 2003, wei ye wrote:
> > >
> > > > I'm trying to crawl url with  --include-directories='/r/'
> > > > parameter.
> > > >
> > > > I expect to crawl '/r/*', but wget gives me '/r*'.
> > > >
> > > > By reading the code, it turns out that cmd_directory_vector()
> > > > removed the trailing '/' of include-directories '/r/'.
> > > >
> > > > It's a minor bug, but I hope it could be fix in next version.
> > > >
> > > > Thanks!
> > > >
> > > > static int cmd_directory_vector(...) {
> > > >  ...
> > > >           if (len > 1)
> > > >             {
> > > >               if ((*t)[len - 1] == '/')
> > > >                 (*t)[len - 1] = '\0';
> > > >             }
> > > >  ...
> > > >
> > > > }
> > > >
> > > > =====
> > > > Wei Ye
> 
> -- 
> Fight for Free Digital Speech
> www.digitalspeech.org


=====
Wei Ye

__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

Re: trailing '/' of include-directories removed bug

Reply via email to