Hi,

I'm using wget for years now on various OSes, it's a wonderful tool! Many 
thanks for it!

BTW1: It took me some time to find get-rid-of-robots feature as
      '-e robots=off'. It probably is an intention. Funny.:)
BTW2: The source is really nice and it's commentary is great!

I have some notes and "feature requests". I needed some minor new
functionality quickly so I downloaded 1.9, changed it and I'm using it
locally for myself. I think someone else may like my changes and ideas, there
are more of them than I realized to the source so far.

Tell me please, what is better way:

1) Just to post my ideas here to discuss and let the core developers to code
them the way they know.

I'm not experienced in writing nor changing .texi files (as requested in
http://wget.sunsite.dk/wgetdev.html if you're sending patches). So this way
is convenient for me as I'm only writing my ideas and someone else is doing
the job for me (probably better, but he's not as interested in new features 
as I am...)

2) Or should I learn http://wget.sunsite.dk/wgetdev.html well, write all the
stuff myself and send it to wget-patches and hope I didn't throw away hours
of my work and the patches will be accepted?

I have five ideas to enhance the code. The approver will like two of them
and disapprove the other two. Is it better for me then to send all my ideas
in separate patchfiles?

---

In short, I like my ideas and would be honoured to see them permanently in
the future wget.:) But I don't want to spend some useless hours coding it my 
way and be rejected. If the maintainer will like such features and noone else 
will volunteer, I'll gladly do it.

And now for something completely different: For the patient ones I have my 
list. I propose these features:

<<>> --skip-requisites

  Sometimes I wish to ignore all the inline images and sounds and just want
  to get the linked content.
  Example: Gallery page with many <A..."contentXX.jpg"> and
  <IMG SRC..."thumbXX.jpg">. I want to get the content and ignore the
  thumbnails.
My solution: 

--- wget-1.9/src/recur.c        2003-10-11 14:57:12.000000000 +0100
+++ wget-1.9a/src/recur.c       2003-12-28 21:42:58.000000000 +0100
@@ -485,6 +485,9 @@
       goto out;
     }

+  if (opt.skip_requisites && upos->link_inline_p)
+    goto out;
+
   /* 4. Check for parent directory.

<<>> --purge-partial-downloads

  I know there is an advanced feature of continuing the interrupted 
  downloads, but I hate to have the files on my disk garbled when the wget
  gets terminated. Therefore I think such option would be useful.

  My solution: Incorporates SIGQUIT, SIGTERM, SIGINT and SIGABRT interception
  during get_contents(), special return code -3 from this function. In this
  case, if (opt.dfp == NULL), the local file is deleted, such event 
  logged and wget is exit()ed. The patch is not short and I can post it on
  request.

<<>> --host-level=n

  It would be useful for me to differentiate the maximum recursion level in
  the case wget is spidering on the original host and when it is spaning
  other hosts. I may want the files in 5 levels on the original host but
  just to the level of 2 on the other hosts. To let -l5 and -H together can
  quickly try to fill my disk.:)

  Example: Galleries again.:) I'm downloading images which may have
  separate html page on the original host (gallery page with thumbs <A...> ->
  many HTML pages each with inline big image). I want to download the big 
  images even when the original host links them from another host. But to let
  the same recursion level is dangerous (or expensive).

<<>> Finish the job on SIGUSR2

  "Time to die, wget!" "I'm not quite dead!"
  In short:
     First SIGUSR2 - finish current URL (cmd line and -i) and terminate
    Second SIGUSR2 - finish current file and terminate
  In both cases: log the reason at the end.

  This feature is meant for cases when I have not enough time to continue 
  entire download job but there still is some time to cleanly finish part of 
  it.

  IMHO, documentation does not mention signal other than SIGHUP. I think some 
  note about interception of SIGUSR1 and SIGWINCH would be useful.

<<>> Job progress info in log

  Sometimes I have a list of URLs to be recursively downloaded and have not 
  enough time to finish all the job. It would be great to have the
  information about successully finished URLs in the log.

  This feature may be turned on by a switch and the output should be easily
  parsed - so some simple script is able to produce the NEW list of
  still-not-successfully-downloaded URLs from the original list and the log.
  The -nv log mode is better for parsing so I'd like this "DONE URL: xyz"  
  message in -nv log too...

  Generally, I hope the developers are taking care of the fact, that the log 
  output of wget is frequently being parsed by smart or even dumb scripts.:)

Thank you for your time,

Vlada Macek

Reply via email to