Re: [TriLUG] Website Directory Listing via HTTP?

Timothy A. Chagnon Thu, 25 Aug 2005 12:43:15 -0700

On Thu, 2005-08-25 at 15:13 -0400, Matt Frye wrote:
> Ok, now how about the perl to extract the dir listing?


Apache lists links from an auto-index page one per-line.  So something
like this might do:

$ wget http://foo.bar/index.html
$ grep href index.html |perl -p -e 's/^.*href=\"//; s/\".*$//;'

Grep eliminates lines that do not have links. The first s/// deletes
everything from the beginning of the line to href=".  The second s///
deletes everything after the closing quote.  Thus you get a list of
URLs.  !_This doesn't account for multiple links on a line_!

This could be fed back into wget with --input-file if so desired.  Of
course that option will take a raw html file as well, eliminating the
need for the perl.

Tim

-- 
TriLUG mailing list        : http://www.trilug.org/mailman/listinfo/trilug
TriLUG Organizational FAQ  : http://trilug.org/faq/
TriLUG Member Services FAQ : http://members.trilug.org/services_faq/
TriLUG PGP Keyring         : http://trilug.org/~chrish/trilug.asc

Re: [TriLUG] Website Directory Listing via HTTP?

Reply via email to