Hi, Jean-Marc MOLINA schrieb: > I have an other opinion about that limitation. Could it be considered as a > bug ? From the "Types of Files" section of the manual we can read : « Note > that these two options do not affect the downloading of html files; Wget > must load all the htmls to know where to go at all-recursive retrieval would > make no sense otherwise. ». It means the accept and reject options don't > work on HTML files. But I think they should because, special in this case, > you deliberately have to exclude them. Excluding them makes sense. So I > don't really know what to do... Consider the problem as a bug, as a new > feature to implement or as an existing feature that should be redesigned. > It's pretty tricky.
I just set up my compile environment for WGet again. When I did regex support, I had the same problem with exclusion, so I introduced a new parameter "--follow-excluded-html". (Which is of course the default) but you can turn it off with --no-follow-excluded-html... See attached patch for current trunk. TT
Index: trunk/src/init.c =================================================================== --- trunk/src/init.c (revision 2133) +++ trunk/src/init.c (working copy) @@ -146,6 +146,7 @@ #endif { "excludedirectories", &opt.excludes, cmd_directory_vector }, { "excludedomains", &opt.exclude_domains, cmd_vector }, + { "followexcluded", &opt.followexcluded, cmd_boolean }, { "followftp", &opt.follow_ftp, cmd_boolean }, { "followtags", &opt.follow_tags, cmd_vector }, { "forcehtml", &opt.force_html, cmd_boolean }, @@ -277,6 +278,7 @@ opt.cookies = true; opt.verbose = -1; + opt.followexcluded = 1; opt.ntry = 20; opt.reclevel = 5; opt.add_hostdir = true; Index: trunk/src/main.c =================================================================== --- trunk/src/main.c (revision 2133) +++ trunk/src/main.c (working copy) @@ -158,6 +158,7 @@ { "exclude-directories", 'X', OPT_VALUE, "excludedirectories", -1 }, { "exclude-domains", 0, OPT_VALUE, "excludedomains", -1 }, { "execute", 'e', OPT__EXECUTE, NULL, required_argument }, + { "follow-excluded-html", 0, OPT_BOOLEAN, "followexcluded", -1 }, { "follow-ftp", 0, OPT_BOOLEAN, "followftp", -1 }, { "follow-tags", 0, OPT_VALUE, "followtags", -1 }, { "force-directories", 'x', OPT_BOOLEAN, "dirstruct", -1 }, @@ -611,6 +612,9 @@ -X, --exclude-directories=LIST list of excluded directories.\n"), N_("\ -np, --no-parent don't ascend to the parent directory.\n"), + N_("\ + --follow-excluded-html turns on downloading of excluded files for\n\ + inspection (this is the default).\n"), "\n", N_("Mail bug reports and suggestions to <[EMAIL PROTECTED]>.\n") Index: trunk/src/recur.c =================================================================== --- trunk/src/recur.c (revision 2133) +++ trunk/src/recur.c (working copy) @@ -511,13 +511,14 @@ && !(has_html_suffix_p (u->file) /* The exception only applies to non-leaf HTMLs (but -p always implies non-leaf because we can overstep the - maximum depth to get the requisites): */ - && (/* non-leaf */ + maximum depth to get the requisites): + No execption if the user specified no-follow-excluded */ + && (opt.followexcluded && (/* non-leaf */ opt.reclevel == INFINITE_RECURSION /* also non-leaf */ || depth < opt.reclevel - 1 /* -p, which implies non-leaf (see above) */ - || opt.page_requisites))) + || opt.page_requisites)))) { if (!acceptable (u->file)) {