die zombie downloads die die die!

2005-09-19 Thread nathaniel t
I'm trying to port a simple automatic downloading script from
Windows to Linux.  I run wget from inside a gnome-terminal or an
xterm.  If somebody closes the window, wget keeps going,
invisibly, and redirects its output to a log file.  This seems
specific to wget, not a general characteristic of the terminals,
and in this case it is not desirable behavior.  It creates
elaborate extra steps in trying to cancel an executing script,
and for users who don't understand this counterintuitive
phenomenon it can lead to situations where multiple instances of
wget, only one of which is visible, are trying to download the
same file at the same

This may be such a well-established feature that nobody cares but
me, but I figured I should say something while I'm thinking about
it.  A command-line option to disable this behavior might be
nice.



Re: die zombie downloads die die die!

2005-09-19 Thread Hrvoje Niksic
I consider it a feature.  Wget was designed specifically for
downloading in the background; it catches the hangup signal to it
knows not to write to the (now defunct) terminal.  The idea is for
downloads in SSH/telnet/modem sessions to keep running if the user is
accidentally disconnected.

If I understand you correctly, you're closing the terminal by pressing
the "x" button in the top-right corner while Wget is running.  If so,
why?  Isn't it better to interrupt Wget by pressing ^C?  IIRC many
programs get confused if you cut the terminal from under them.


Re: wget itself discards # and the rest in urls

2005-09-19 Thread Martin Koniczek

"Hrvoje Niksic" <[EMAIL PROTECTED]> wrote:


my wget (GNU Wget 1.10) on a crux-based system simply truncates the
# and everything after [...]


The part after the "#" in HTTP URLs is what some call a "fragment
identifier".  [...]


oh dear - sorry for asking that stupid question - too much "low level" tech 
stuff made me missing the obvious...



in contrast to the faq (http://www.gnu.org/software/wget/faq.html):


[...]

The FAQ is very imprecise here with its use of the term "funny
characters". [...]


this faq additionally misled me - perhaps just kill the # from the "funny 
characters" listing?
or adding a small note on how wget builds its http request from given URLs 
in the manpage?


sincerly,
   martin koniczek 



Re: wget itself discards # and the rest in urls

2005-09-19 Thread Hrvoje Niksic
"Martin Koniczek" <[EMAIL PROTECTED]> writes:

>>> in contrast to the faq (http://www.gnu.org/software/wget/faq.html):
>>>
>> [...]
>>
>> The FAQ is very imprecise here with its use of the term "funny
>> characters". [...]
>
> this faq additionally misled me - perhaps just kill the # from the
> "funny characters" listing?

I agree that the phrasing of that FAQ entry is misleading.  But IMHO
it *should* address the "#" character as well.  For example, one might
wonder how to retrieve a file named "foo#bar" from a remote server,
and replacing "#" with "%23" is the simplest solution I'm aware of.

> or adding a small note on how wget builds its http request from
> given URLs in the manpage?

Is that really necessary?  Interpreting "#" (and other characters
special to URLs) is a normal part of using URLs, not at all specific
to Wget; after all, all browsers do it.  For what it's worth, Wget's
documentation (but not the man page) refers the user to "RFC1738",
which does mention fragment identifiers, but does not explain the
consequences of their use in web client software.


Re: wget itself discards # and the rest in urls

2005-09-19 Thread Mauro Tortonesi
Alle 13:59, domenica 18 settembre 2005, Hrvoje Niksic ha scritto:

> > in contrast to the faq (http://www.gnu.org/software/wget/faq.html):
> >
> > 3.3 How do I download a URL with funny characters in it?
>
> [...]
>
> The FAQ is very imprecise here with its use of the term "funny
> characters".  There are characters that are specially processed by the
> shell, and then there are characters with special meanings in URLs.
> The former can be protected by shell quoting and the latter by URL
> quoting.

yes, it definitely is.

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: wget 1.10.1 released

2005-09-19 Thread Mauro Tortonesi
Alle 00:50, sabato 10 settembre 2005, Steven M. Schweda ha scritto:
> A final kit for Wget 1.10.1a for VMS is available in the usual places:
>
>   http://antinode.org/dec/sw/wget.html
>
>   http://antinode.org/ftp/wget/wget-1_10_1a_vms/
>   ftp://antinode.org/wget/wget-1_10_1a_vms/
>
> I'm still waiting for anyone (else) to show any interest in getting
> the VMS-specific changes and other fixes and changes into the main Wget
> code stream.  I'll just hold my breath, then, shall I?

the wget code is going through a major refactoring effort. later on, just 
before releasing wget 2.0, i promise i will re-evaluate your patches and 
merge them if they're not too intrusive.

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it



Re: with recursive wget status code does not reflect success/failure of operation

2005-09-19 Thread Mauro Tortonesi
Alle 09:06, sabato 17 settembre 2005, Steven M. Schweda ha scritto:
>I suppose that it's a waste of time and space to point this out here,
> but native VMS status codes include a severity field (the low three
> bits), with popular values being (from STSDEF.H):
>
> #define STS$K_WARNING 0 /* WARNING 
> */ #define STS$K_SUCCESS 1 /* SUCCESSFUL COMPLETION
>*/ #define STS$K_ERROR 2   /* ERROR 
>   */ #define STS$K_INFO 3/* INFORMATION
>  */ #define STS$K_SEVERE 4  /* SEVERE ERROR
> */
>
> Note that success (including informational) is odd, while anything worse
> is even.
>
>While an OS like UNIX is handicapped by not having such a useful
> convention, it would make sense for Wget to have status values which
> distinguish among degrees of success and failure, like the ones
> suggested by Hrvoje Niksic.

yes, i think it's a good idea.

>Ideally, the values used could be defined in some central location,
> allowing convenient replacement with suitable VMS-specific values when
> the time comes.  (Naturally, _all_ exit() calls and/or return statements
> should use one of the pre-defined values.)

mmh, i don't understand why we should use VMS-specific values in wget.

>And, as long as I'm wasting time and space, I'll note that I'd still
> like to see my VMS-related (and other) changes integrated into the main
> Wget code stream.

the wget code is going through a major refactoring effort. later on, just 
before releasing wget 2.0, i promise i will re-evaluate your patches and 
merge them if they're not too intrusive.

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: wget 1.10.1 released

2005-09-19 Thread Steven M. Schweda
From: Mauro Tortonesi <[EMAIL PROTECTED]>

> the wget code is going through a major refactoring effort. later on, just 
> before releasing wget 2.0, i promise i will re-evaluate your patches and 
> merge them if they're not too intrusive.

   That would be nice.  I believe that the only _very_ intrusive change
is in the handling of a VMS FTP server, which needed a lot of work.

   I would think that getting the changes in sooner would be easier,
however.

   Interestingly, I've found yet another VMS FTP server variety which
seems to require a change to the CWD code so that instead of doing "CWD
abc/def/ghi" (as now), it would need to do "CWD abc", "CWD def", and
"CWD ghi", as the RFC suggests.

   The problem on this particular server is that it identifies itself as
a VMS server and acts accordingly until the user supplies a UNIX-like
directory spec (like "abc/def/ghi"), at which point it switches into a
UNIX-emulation mode.  Because Wget doesn't keep checking the system
type (why would it change?), multi-file FTP downloads from such a server
may be expected to fail.  I'm still thinking about fixing this one.



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547


Re: with recursive wget status code does not reflect success/failure of operation

2005-09-19 Thread Hrvoje Niksic
Mauro Tortonesi <[EMAIL PROTECTED]> writes:

> mmh, i don't understand why we should use VMS-specific values in
> wget.

The closest Unix has to offer are these BSD-specific values which few
programs use:

/*
 *  SYSEXITS.H -- Exit status codes for system programs.
 *
 *  This include file attempts to categorize possible error
 *  exit statuses for system programs, notably delivermail
 *  and the Berkeley network.
 *
 *  Error numbers begin at EX__BASE to reduce the possibility of
 *  clashing with other exit statuses that random programs may
 *  already return.  The meaning of the codes is approximately
 *  as follows:
 *
 *  EX_USAGE -- The command was used incorrectly, e.g., with
 *  the wrong number of arguments, a bad flag, a bad
 *  syntax in a parameter, or whatever.
 *  EX_DATAERR -- The input data was incorrect in some way.
 *  This should only be used for user's data & not
 *  system files.
 *  EX_NOINPUT -- An input file (not a system file) did not
 *  exist or was not readable.  This could also include
 *  errors like "No message" to a mailer (if it cared
 *  to catch it).
 *  EX_NOUSER -- The user specified did not exist.  This might
 *  be used for mail addresses or remote logins.
 *  EX_NOHOST -- The host specified did not exist.  This is used
 *  in mail addresses or network requests.
 *  EX_UNAVAILABLE -- A service is unavailable.  This can occur
 *  if a support program or file does not exist.  This
 *  can also be used as a catchall message when something
 *  you wanted to do doesn't work, but you don't know
 *  why.
 *  EX_SOFTWARE -- An internal software error has been detected.
 *  This should be limited to non-operating system related
 *  errors as possible.
 *  EX_OSERR -- An operating system error has been detected.
 *  This is intended to be used for such things as "cannot
 *  fork", "cannot create pipe", or the like.  It includes
 *  things like getuid returning a user that does not
 *  exist in the passwd file.
 *  EX_OSFILE -- Some system file (e.g., /etc/passwd, /etc/utmp,
 *  etc.) does not exist, cannot be opened, or has some
 *  sort of error (e.g., syntax error).
 *  EX_CANTCREAT -- A (user specified) output file cannot be
 *  created.
 *  EX_IOERR -- An error occurred while doing I/O on some file.
 *  EX_TEMPFAIL -- temporary failure, indicating something that
 *  is not really an error.  In sendmail, this means
 *  that a mailer (e.g.) could not create a connection,
 *  and the request should be reattempted later.
 *  EX_PROTOCOL -- the remote system returned something that
 *  was "not possible" during a protocol exchange.
 *  EX_NOPERM -- You did not have sufficient permission to
 *  perform the operation.  This is not intended for
 *  file system problems, which should use NOINPUT or
 *  CANTCREAT, but rather for higher level permissions.
 */


Re: with recursive wget status code does not reflect success/failure of operation

2005-09-19 Thread Steven M. Schweda
From: Mauro Tortonesi <[EMAIL PROTECTED]>

> >Ideally, the values used could be defined in some central location,
> > allowing convenient replacement with suitable VMS-specific values when
> > the time comes.  (Naturally, _all_ exit() calls and/or return statements
> > should use one of the pre-defined values.)
> 
> mmh, i don't understand why we should use VMS-specific values in wget.

   On VMS (not elsewhere), Wget should use VMS-specific values.  The VMS
C RTL is willing to convert 0 into a generic success code, but 1 (EPERM,
"Not owner") and 2 (ENOENT, "No such file or directory") would tend to
confuse the users (and the rest of the OS).

   Having the exit codes defined in a central location would make it
easy to adapt them as needed.  Having to search the code for every
instance of "return 1" or "exit(2)" would make it too complicated.



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547


Re: with recursive wget status code does not reflect success/failure of operation

2005-09-19 Thread Mauro Tortonesi
Alle 18:06, lunedì 19 settembre 2005, Hrvoje Niksic ha scritto:
> Mauro Tortonesi <[EMAIL PROTECTED]> writes:
> > mmh, i don't understand why we should use VMS-specific values in
> > wget.
>
> The closest Unix has to offer are these BSD-specific values which few
> programs use:
>
> /*
>  *  SYSEXITS.H -- Exit status codes for system programs.
>  *
>  *This include file attempts to categorize possible error
>  *exit statuses for system programs, notably delivermail
>  *and the Berkeley network.
>  *
>  *Error numbers begin at EX__BASE to reduce the possibility of
>  *clashing with other exit statuses that random programs may
>  *already return.  The meaning of the codes is approximately
>  *as follows:
>  *
>  *EX_USAGE -- The command was used incorrectly, e.g., with
>  *the wrong number of arguments, a bad flag, a bad
>  *syntax in a parameter, or whatever.
>  *EX_DATAERR -- The input data was incorrect in some way.
>  *This should only be used for user's data & not
>  *system files.
>  *EX_NOINPUT -- An input file (not a system file) did not
>  *exist or was not readable.  This could also include
>  *errors like "No message" to a mailer (if it cared
>  *to catch it).
>  *EX_NOUSER -- The user specified did not exist.  This might
>  *be used for mail addresses or remote logins.
>  *EX_NOHOST -- The host specified did not exist.  This is used
>  *in mail addresses or network requests.
>  *EX_UNAVAILABLE -- A service is unavailable.  This can occur
>  *if a support program or file does not exist.  This
>  *can also be used as a catchall message when something
>  *you wanted to do doesn't work, but you don't know
>  *why.
>  *EX_SOFTWARE -- An internal software error has been detected.
>  *This should be limited to non-operating system related
>  *errors as possible.
>  *EX_OSERR -- An operating system error has been detected.
>  *This is intended to be used for such things as "cannot
>  *fork", "cannot create pipe", or the like.  It includes
>  *things like getuid returning a user that does not
>  *exist in the passwd file.
>  *EX_OSFILE -- Some system file (e.g., /etc/passwd, /etc/utmp,
>  *etc.) does not exist, cannot be opened, or has some
>  *sort of error (e.g., syntax error).
>  *EX_CANTCREAT -- A (user specified) output file cannot be
>  *created.
>  *EX_IOERR -- An error occurred while doing I/O on some file.
>  *EX_TEMPFAIL -- temporary failure, indicating something that
>  *is not really an error.  In sendmail, this means
>  *that a mailer (e.g.) could not create a connection,
>  *and the request should be reattempted later.
>  *EX_PROTOCOL -- the remote system returned something that
>  *was "not possible" during a protocol exchange.
>  *EX_NOPERM -- You did not have sufficient permission to
>  *perform the operation.  This is not intended for
>  *file system problems, which should use NOINPUT or
>  *CANTCREAT, but rather for higher level permissions.
>  */

yes, but i was thinking to define wget specific error codes. are there any 
major objections to this policy?

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: with recursive wget status code does not reflect success/failure of operation

2005-09-19 Thread Hrvoje Niksic
Mauro Tortonesi <[EMAIL PROTECTED]> writes:

> yes, but i was thinking to define wget specific error codes.

I wouldn't object to those.  The scripting people might find them
useful.


Bug rpt

2005-09-19 Thread HonzaCh
Latest version (1.10.1) turns out an UI bug: the thousand separator
(space according to my local settings) displays as "á" (character code
0xA0, see attch.)

Although it does not affect the primary function of WGET, it looks quite
ugly.

Env.: Win2k Pro/Czech (CP852 for console apps, CP1250 for windowed
ones).

Sincerely,
Jan Chochola ([EMAIL PROTECTED])


ScreenShot16.gif
Description: GIF image


Re: Bug rpt

2005-09-19 Thread Hrvoje Niksic
"HonzaCh" <[EMAIL PROTECTED]> writes:

> Latest version (1.10.1) turns out an UI bug: the thousand separator
> (space according to my local settings) displays as "á" (character
> code 0xA0, see attch.)
>
> Although it does not affect the primary function of WGET, it looks
> quite ugly.
>
> Env.: Win2k Pro/Czech (CP852 for console apps, CP1250 for windowed
> ones).

Thanks for the report.  Is this a natively compiled Wget or one
compiled on Cygwin?

Wget obtains the thousand separator from the operating system using
the `localeconv' function.  According to MSDN
(http://tinyurl.com/cumk2 and http://tinyurl.com/chubg), Wget's usage
appears to be correct.  I'd be surprised if that function didn't
function properly on Windows.

Can other Windows testers repeat this problem?


RE: with recursive wget status code does not reflect success/failure of operation

2005-09-19 Thread Tony Lewis
Steven M. Schweda wrote:

> Having the exit codes defined in a central location would make it easy
> to adapt them as needed.  Having to search the code for every instance
> of "return 1" or "exit(2)" would make it too complicated.

It seems to me that the easiest way to deal with exit codes is to have a
single function to set the exit code. For example:

  setexitcode(WGET_EXIT_SUCCESS);
or
  setexitcode(WGET_EXIT_QUOTA_EXCEEDED);

This function should be called any time there is an event that might
influence the exit code and the function can then decide what exit code
should be used based on all calls made prior to the end of program
execution. Not only will such an approach restrict the logic for setting the
error code to one place in the code, it will make OS-specific versions of
the error code (such as what Steven desires for VMS) much easier to
implement.

The biggest challenge will be determining the list of WGET_EXIT_* constants
and the interactions between them that influence the final value of the exit
code.

(Was that worth two cents?)

Tony




Re: with recursive wget status code does not reflect success/failure of operation

2005-09-19 Thread Hrvoje Niksic
"Tony Lewis" <[EMAIL PROTECTED]> writes:

> Steven M. Schweda wrote:
>
>> Having the exit codes defined in a central location would make it easy
>> to adapt them as needed.  Having to search the code for every instance
>> of "return 1" or "exit(2)" would make it too complicated.
>
> It seems to me that the easiest way to deal with exit codes is to have a
> single function to set the exit code. For example:
>
>   setexitcode(WGET_EXIT_SUCCESS);
> or
>   setexitcode(WGET_EXIT_QUOTA_EXCEEDED);
>
> This function should be called any time there is an event that might
> influence the exit code and the function can then decide what exit
> code should be used based on all calls made prior to the end of
> program execution. Not only will such an approach restrict the logic
> for setting the error code to one place in the code, it will make
> OS-specific versions of the error code (such as what Steven desires
> for VMS) much easier to implement.

That's not a bad idea at all.  Very unconventional (at least to my
knowledge), but not bad at all.  It could even be modified to
accomodate people who want different error codes for different
occasions.

What Wget could do is:

* If no errors have been set, return 0.

* If only one error has been set, return an error code indicating
  which error it was (and document error codes in the manual).

* If multiple different errors have been set, choose the last one,
  which can be expected to have aborted the downloads.  After all,
  many errors are recoverable.

* If at least one error has been set and success has also been
  reported, return a generic error meaning "there have been some
  errors".  That's the best we can do, given that the download wasn't
  really aborted by the error.

This should work both for people who download one URL and expect to
get status codes that shed some light on what happened and for people
who start a large download and expect different error codes for "OK",
"some download failed", and "download aborted".

But I wonder if that's overengineering at work.


Re: with recursive wget status code does not reflect success/failure of operation

2005-09-19 Thread Steven M. Schweda
From: Hrvoje Niksic <[EMAIL PROTECTED]>

> "Tony Lewis" <[EMAIL PROTECTED]> writes:

> > It seems to me that the easiest way to deal with exit codes is to have a
> > single function to set the exit code. For example:
> >
> >   setexitcode(WGET_EXIT_SUCCESS);
> > or
> >   setexitcode(WGET_EXIT_QUOTA_EXCEEDED);
> > [...]

> What Wget could do is:
> 
> * If no errors have been set, return 0.
> [...]

   I don't want to seem like a chronic complainer (although that might
be an accurate description), but "return 0" is exactly the wrong thing
to do.  Better would be "return WGET_EXIT_SUCCESS" (or something
similar).

   For example, in the (highly portable) Info-ZIP Zip program, ziperr.h
includes macros like these:

#define ZE_MISS -1  /* used by procname(), zipbare() */
#define ZE_OK   0   /* success */
#define ZE_EOF  2   /* unexpected end of zip file */
#define ZE_FORM 3   /* zip file structure error */
#define ZE_MEM  4   /* out of memory */
[...]

   Please note that "ZE_OK" exists, and it's used instead of a
hard-coded zero.

   Even better, Zip has a macro, EXIT (frequently defined as "exit"),
which is used in all places where an exit status is returned to the OS. 
(On VMS, it's defined as "vms_exit", a VMS-specific function in which
special VMS stuff is done, like combining the raw status value with a
severity code and a facility code, before calling the normal exit()
function.)

   That's just good engineering, not over-engineering.



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547


Re: with recursive wget status code does not reflect success/failure of operation

2005-09-19 Thread Hrvoje Niksic
[EMAIL PROTECTED] (Steven M. Schweda) writes:

> I don't want to seem like a chronic complainer (although that might
> be an accurate description), but "return 0" is exactly the wrong thing
> to do.

Wget is a Unix program.  Unix programs do return 0 on success.

C does provide EXIT_SUCCESS and EXIT_FAILURE, but then you don't have
anything else to return.  Besides, Wget already uses Unix-like
functionality such as BSD networking, so it's not exactly written
using only strictly conforming C.

> Even better, Zip has a macro, EXIT (frequently defined as "exit"),
> which is used in all places where an exit status is returned to the
> OS. (On VMS, it's defined as "vms_exit", a VMS-specific function in
> which special VMS stuff is done, like combining the raw status value
> with a severity code and a facility code, before calling the normal
> exit() function.)
>
>That's just good engineering, not over-engineering.

I agree -- as long as portability to non-Unix platforms like VMS is a
design goal.  During my tenure it wasn't, but Mauro can certainly
change that.

Anyway, Tony and I were discussing something different and more
complex, and the over-engineering adjective referred to that, not to
what you're proposing.


RE: with recursive wget status code does not reflect success/failure of operation

2005-09-19 Thread Tony Lewis
Hrvoje Niksic wrote:
 
> But I wonder if that's overengineering at work.

I don't think so. The overarching concern is to do what's "expected". As you
noted elsewhere, on a Unix system, that means exit(0) in the case of success
-- preferably with exit(meaningful_value) otherwise. As I recall this chain
started because of the absence of a meaningful value.

I think the use of a setexitcode function could easily satisfy people in the
Unix world and will greatly simply people adapting wget for other operating
systems.

Reflecting on the exchange that you and Steven just had, I think we also
need at wget_exit function that calls exit with an appropriate value. (That
will allow Steven to further adapt for the VMS environment.) In that case,
exit should only be called by wget_exit.

By the way, when do we start on 2.0? I don't know how much time I will be
able to devote to serious coding, but I'd love to participate as fully as I
can in both the architecture and development.

Tony