On 8/26/05, Shane O'Donnell <[EMAIL PROTECTED]> wrote: > So I'm trying to come up with text file listings of everything that's > on the server (which is 180+GB or so) without having to download it > all. The previous "links -dump" suggestion comes close, but doesn't > recurse. wget recurses, but either downloads the file or provides > back a very verbose message that I'm trying not to parse to hack the > info out of it. curl -l would do it, but only for an ftp server.
Have wget recurse and limit it to only getting .html files (or even better, index.html files). That will get the directory structure and a list (in whatever html format the server uses) in each directory of what files reside in that directory. You can then, offline, use a perl or something program to recurse through the directories and convert those index.html files to text. (Heck, recurse through and do lynx -dump on each index.html file would probably do it.) Cheers, Tanner -- Tanner Lovelace clubjuggler at gmail dot com http://wtl.wayfarer.org/ (fieldless) In fess two roundels in pale, a billet fesswise and an increscent, all sable. -- TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug TriLUG Organizational FAQ : http://trilug.org/faq/ TriLUG Member Services FAQ : http://members.trilug.org/services_faq/ TriLUG PGP Keyring : http://trilug.org/~chrish/trilug.asc
