oh, i understand your problem.  your request seems reasonable.  i was
trying to see if anyone had an idea why it seemed to be more of a
"feature" than a "bug".

On Thu, 12 Jun 2003, wei ye wrote:

>
> Please take a look this example:
> $ \rm -rf biz.yahoo.com
> $ ls biz.yahoo.com
> $ wget -r  --domains=biz.yahoo.com -I /r/ 'http://biz.yahoo.com/r/'
> $ ls biz.yahoo.com/
> r/              reports/        research/
> $
>
> I want only '/r/', but it crawls /r*, which includes /reports/, /research/.
>
> Is it an expected result or a bug?
>
> Thanks alot!
>
>
> --- "Aaron S. Hawley" <[EMAIL PROTECTED]> wrote:
> > above the code segment you submitted (line 765 of init.c) the
> > comment:
> >
> > /* Strip the trailing slashes from directories.  */
> >
> > here are the manual notes on this option:
> >
> > (from "Recursive Accept/Reject Options")
> >
> > `-I list'
> > `--include-directories=list'
> >     Specify a comma-separated list of directories you wish to follow when
> > downloading (See section Directory-Based Limits for more details.)
> > Elements of list may contain wildcards.
> >
> >  --- and ---
> >
> > (from "Directory-Based Limits")
> >
> > `-I list'
> > `--include list'
> > `include_directories = list'
> >     `-I' option accepts a comma-separated list of directories included in
> > the retrieval. Any other directories will simply be ignored. The
> > directories are absolute paths. So, if you wish to download from
> > `http://host/people/bozo/' following only links to bozo's colleagues in
> > the `/people' directory and the bogus scripts in `/cgi-bin', you can
> > specify:
> >
> > wget -I /people,/cgi-bin http://host/people/bozo/
> >
> > ---
> >
> > On Wed, 11 Jun 2003, wei ye wrote:
> >
> > > I'm trying to crawl url with  --include-directories='/r/'
> > > parameter.
> > >
> > > I expect to crawl '/r/*', but wget gives me '/r*'.
> > >
> > > By reading the code, it turns out that cmd_directory_vector()
> > > removed the trailing '/' of include-directories '/r/'.
> > >
> > > It's a minor bug, but I hope it could be fix in next version.
> > >
> > > Thanks!
> > >
> > > static int cmd_directory_vector(...) {
> > >  ...
> > >           if (len > 1)
> > >             {
> > >               if ((*t)[len - 1] == '/')
> > >                 (*t)[len - 1] = '\0';
> > >             }
> > >  ...
> > >
> > > }
> > >
> > > =====
> > > Wei Ye

-- 
Fight for Free Digital Speech
www.digitalspeech.org

Reply via email to