Re: [TriLUG] Website Directory Listing via HTTP?

Tanner Lovelace Fri, 26 Aug 2005 10:16:11 -0700

On 8/26/05, Shane O'Donnell <[EMAIL PROTECTED]> wrote:
> So I'm trying to come up with text file listings of everything that's
> on the server (which is 180+GB or so) without having to download it
> all.  The previous "links -dump" suggestion comes close, but doesn't
> recurse.  wget recurses, but either downloads the file or provides
> back a very verbose message that I'm trying not to parse to hack the
> info out of it.  curl -l would do it, but only for an ftp server.


Have wget recurse and limit it to only getting .html files (or
even better, index.html files).  That will get the directory 
structure and a list (in whatever html format the server uses)
in each directory of what files reside in that directory.  You
can then, offline, use a perl  or something program to 
recurse through the directories and convert those index.html files
to text.  (Heck, recurse through and do lynx -dump on each
index.html file would probably do it.)

Cheers,
Tanner
-- 
Tanner Lovelace
clubjuggler at gmail dot com
http://wtl.wayfarer.org/
(fieldless) In fess two roundels in pale, a billet fesswise and an
increscent, all sable.
--
TriLUG mailing list        : http://www.trilug.org/mailman/listinfo/trilug
TriLUG Organizational FAQ  : http://trilug.org/faq/
TriLUG Member Services FAQ : http://members.trilug.org/services_faq/
TriLUG PGP Keyring         : http://trilug.org/~chrish/trilug.asc

Re: [TriLUG] Website Directory Listing via HTTP?

Reply via email to