On Thu, 2005-08-25 at 15:13 -0400, Matt Frye wrote: > Ok, now how about the perl to extract the dir listing?
Apache lists links from an auto-index page one per-line. So something like this might do: $ wget http://foo.bar/index.html $ grep href index.html |perl -p -e 's/^.*href=\"//; s/\".*$//;' Grep eliminates lines that do not have links. The first s/// deletes everything from the beginning of the line to href=". The second s/// deletes everything after the closing quote. Thus you get a list of URLs. !_This doesn't account for multiple links on a line_! This could be fed back into wget with --input-file if so desired. Of course that option will take a raw html file as well, eliminating the need for the perl. Tim -- TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug TriLUG Organizational FAQ : http://trilug.org/faq/ TriLUG Member Services FAQ : http://members.trilug.org/services_faq/ TriLUG PGP Keyring : http://trilug.org/~chrish/trilug.asc
