I believe the following simplified code would have the same effect:

if ((opt.recursive || opt.page_requisites || opt.use_proxy)
&& url_scheme (*t) != SCHEME_FTP)
  status = retrieve_tree (*t);
else
  status = retrieve_url
  (*t, &filename, &redirected_URL, NULL, &dt);

Tony


From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of CHEN Peng
Sent: Monday, January 09, 2006 12:38 AM
To: [EMAIL PROTECTED]
Subject: wget 1.10.x fixed recursive ftp download over proxy

Hi,

We once encounter an annoying problem of recursively downloading FTP data using wget, through a ftp-over-http proxy. Previously it was the proxy firmware that does not support recursive downloads, but even upgrading we realized there is problem with wget itself as well. 

We found that with new proxy firmwire, the older wget 1.7.x can download FTP database recursively, but the newer version (1.9.x and 1.10.x) can not. That means there must be something wrong with the code.

I also confirmed this is a known bug for wget since 2003 and it is strange it has not been fixed for a long time.

To fix this problem, I took some time to analyze its code and it happens wget uses different method to get the list of files for a destination folder when trying to do recursive download. For normal FTP, it uses FTP command "LIST" to get the file listing. For normal HTTP, it uses its internal method "retrieve_tree()" to generate the lists.

In main.c, it does to use retrieve_tree() function to generate list if the traffic is FTP. Howerver, when we use ftp-over-http proxy, the actual request to the server is HTTP request, where the "LIST" FTP command wont work, so we only get one "index.html" file.

if ((opt.recursive || opt.page_requisites)
&& url_scheme (*t) != SCHEME_FTP)
  status = retrieve_tree (*t);
else
  status = retrieve_url
  (*t, &filename, &redirected_URL, NULL, &dt);

In this scenario, we need to modify the code to force wget call retrieve_tree function for FTP traffic if the proxy is involved

if ((opt.recursive || opt.page_requisites)
//  && url_scheme (*t) != SCHEME_FTP)
&& ((url_scheme (*t) != SCHEME_FTP) ||
    (opt.use_proxy && url_scheme (*t) == SCHEME_FTP)))
  status = retrieve_tree (*t);
else
  status = retrieve_url
  (*t, &filename, &redirected_URL, NULL, &dt);

After patching the main.c, the new wget works perfectly for FTP recursive downloading, both with proxy and without proxy. This patching works for 1.9.x and 1.10.x till the latest version so far (1.10.2).

--
CHEN Peng <[EMAIL PROTECTED]>

Reply via email to