Re: Download all the necessary files and linked images

Tobias Tiederle Thu, 09 Mar 2006 07:53:59 -0800

Hi,

Jean-Marc MOLINA schrieb:
> I have an other opinion about that limitation. Could it be considered as a
> bug ? From the "Types of Files" section of the manual we can read : « Note
> that these two options do not affect the downloading of html files; Wget
> must load all the htmls to know where to go at all-recursive retrieval would
> make no sense otherwise. ». It means the accept and reject options don't
> work on HTML files. But I think they should because, special in this case,
> you deliberately have to exclude them. Excluding them makes sense. So I
> don't really know what to do... Consider the problem as a bug, as a new
> feature to implement or as an existing feature that should be redesigned.
> It's pretty tricky.


I just set up my compile environment for WGet again.
When I did regex support, I had the same problem with exclusion, so I
introduced a new parameter "--follow-excluded-html".
(Which is of course the default) but you can turn it off with
--no-follow-excluded-html...

See attached patch for current trunk.

TT

Index: trunk/src/init.c
===================================================================
--- trunk/src/init.c    (revision 2133)
+++ trunk/src/init.c    (working copy)
@@ -146,6 +146,7 @@
 #endif
   { "excludedirectories", &opt.excludes,       cmd_directory_vector },
   { "excludedomains",  &opt.exclude_domains,   cmd_vector },
+  { "followexcluded", &opt.followexcluded, cmd_boolean },
   { "followftp",       &opt.follow_ftp,        cmd_boolean },
   { "followtags",      &opt.follow_tags,       cmd_vector },
   { "forcehtml",       &opt.force_html,        cmd_boolean },
@@ -277,6 +278,7 @@
 
   opt.cookies = true;
   opt.verbose = -1;
+  opt.followexcluded = 1;
   opt.ntry = 20;
   opt.reclevel = 5;
   opt.add_hostdir = true;
Index: trunk/src/main.c
===================================================================
--- trunk/src/main.c    (revision 2133)
+++ trunk/src/main.c    (working copy)
@@ -158,6 +158,7 @@
     { "exclude-directories", 'X', OPT_VALUE, "excludedirectories", -1 },
     { "exclude-domains", 0, OPT_VALUE, "excludedomains", -1 },
     { "execute", 'e', OPT__EXECUTE, NULL, required_argument },
+    { "follow-excluded-html", 0, OPT_BOOLEAN, "followexcluded", -1 },
     { "follow-ftp", 0, OPT_BOOLEAN, "followftp", -1 },
     { "follow-tags", 0, OPT_VALUE, "followtags", -1 },
     { "force-directories", 'x', OPT_BOOLEAN, "dirstruct", -1 },
@@ -611,6 +612,9 @@
   -X,  --exclude-directories=LIST  list of excluded directories.\n"),
     N_("\
   -np, --no-parent                 don't ascend to the parent directory.\n"),
+  N_("\
+       --follow-excluded-html      turns on downloading of excluded files 
for\n\
+                                   inspection (this is the default).\n"),
     "\n",
 
     N_("Mail bug reports and suggestions to <[EMAIL PROTECTED]>.\n")
Index: trunk/src/recur.c
===================================================================
--- trunk/src/recur.c   (revision 2133)
+++ trunk/src/recur.c   (working copy)
@@ -511,13 +511,14 @@
       && !(has_html_suffix_p (u->file)
           /* The exception only applies to non-leaf HTMLs (but -p
              always implies non-leaf because we can overstep the
-             maximum depth to get the requisites): */
-          && (/* non-leaf */
+             maximum depth to get the requisites): 
+             No execption if the user specified no-follow-excluded */
+          && (opt.followexcluded && (/* non-leaf */
               opt.reclevel == INFINITE_RECURSION
               /* also non-leaf */
               || depth < opt.reclevel - 1
               /* -p, which implies non-leaf (see above) */
-              || opt.page_requisites)))
+              || opt.page_requisites))))
     {
       if (!acceptable (u->file))
        {

Re: Download all the necessary files and linked images

Reply via email to