Hello,

I have an application where I want to traverse a given site, but only
retrieve pages with a URL that matches a particular pattern.  The
pattern would include a specific directory, and a file name that
has a particular form.  If wget won't accept a general pattern, I'd
like it if wget would just return the URL's it finds during its
recursive traversal, but not return the data.  Given the list of
URL's, I can filter the ones out that I'm interested in, and only
fetch those.  Here's an example - assume that I'm interested in
fetching all FAQ pages that have linux in their file name.  Using
conventional grep patterns, I might be interested in URL's of the
form: '.*/faqs/.*linux.*\.html', for example.  Is there a way to
do something like this in wget, or some other program?

thanks.

Reply via email to