Dear WGet maintainers, I'd like to propose two patches that I've recently implemented on my local copy of WGet. Here is a brief description of the changes + their rationale. I hope they might benefit other people too, so if you agree - please let me know and I'll submit a "lege artis" patch with source diff's + extensive descriptions.
1) Limiting the number of files recursively downloaded from a given URL (in the "-r" mode). Currently, it's only possible to limit file types and the overall download quota, but there is no control over the number of files. The rationale here is to allow to download only a limited number of files, that can sufficiently represent the site for various purposes (I myself conduct research in text analysis/information retrieval, so it's useful to have a representative set of a site's pages). To this end I suggest to add a comman-line/config-file parameter, which will then control the behaviour of function 'retrieve_tree', by returning when the desired number of files has been downloaded. 2) Optionally deleting downloaded files that are not of type "text/html". Even though there is a mechanism for limiting downloaded file types, it's hard to configure Wget to download only files of type "text/html". Setting the Accept list to "htm,html" won't help for exactly the same reason that there exists option "-E", because not all "text/html" files have this extension (e.g., "asp" files or files dynamically created by CGI scripts, to name but a few). Again, for various site analyses it is convenient to only download files of type "text/html". So if it's impossible to selectively download only such files, we can download all files, and then delete all those of other types. Regards, Evgeniy. -- Evgeniy Gabrilovich Ph.D. student in Computer Science Department of Computer Science, Technion - Israel Institute of Technology Technion City, Haifa 32000, Israel E-mail: [EMAIL PROTECTED] WWW: http://www.cs.technion.ac.il/~gabr Phone: (office) +972-4-8294948