DeepVaccum
[saw this on the web..] HexCat Software DeepVaccum http://www.hexcat.com/deepvaccum/ DeepVaccum is a donationware, useful web utility based on GNU wget command line tool. Program includes a vast number of options to fine tune your downloads through both http and ftp protocols. DV enables you to download: whole single pages, entire sites, ftp catalogs, link lists from a text file, filtered types, ex. images. -- "Too much use of American power overseas causes the nation to look like 'the ugly American'." -- Gov. George W. Bush
Re: bug in wget - wget break on time msec=0
"Boehn, Gunnar von" <[EMAIL PROTECTED]> writes: > I think I found a bug in wget. You did. But I believe your subject line is slightly incorrect. Wget handles 0 length time intervals (see the assert message), but what it doesn't handle are negative amounts. And indeed: > gettimeofday({1063461157, 858103}, NULL) = 0 > gettimeofday({1063461157, 858783}, NULL) = 0 > gettimeofday({1063461157, 880833}, NULL) = 0 > gettimeofday({1063461157, 874729}, NULL) = 0 As you can see, the last gettimeofday returned time *preceding* the one before it. Your ntp daemon must have chosen that precise moment to set back the system clock by ~6 milliseconds, to which Wget reacted badly. Even so, Wget shouldn't crash. The correct fix is to disallow the timer code from ever returning decreasing or negative time intervals. Please let me know if this patch fixes the problem: 2003-09-14 Hrvoje Niksic <[EMAIL PROTECTED]> * utils.c (wtimer_sys_set): Extracted the code that sets the current time here. (wtimer_reset): Call it. (wtimer_sys_diff): Extracted the code that calculates the difference between two system times here. (wtimer_elapsed): Call it. (wtimer_elapsed): Don't return a value smaller than the previous one, which could previously happen when system time is set back. Instead, reset start time to current time and note the elapsed offset for future calculations. The returned times are now guaranteed to be monotonically nondecreasing. Index: src/utils.c === RCS file: /pack/anoncvs/wget/src/utils.c,v retrieving revision 1.51 diff -u -r1.51 utils.c --- src/utils.c 2002/05/18 02:16:25 1.51 +++ src/utils.c 2003/09/13 23:09:13 @@ -1532,19 +1532,30 @@ # endif #endif /* not WINDOWS */ -struct wget_timer { #ifdef TIMER_GETTIMEOFDAY - long secs; - long usecs; +typedef struct timeval wget_sys_time; #endif #ifdef TIMER_TIME - time_t secs; +typedef time_t wget_sys_time; #endif #ifdef TIMER_WINDOWS - ULARGE_INTEGER wintime; +typedef ULARGE_INTEGER wget_sys_time; #endif + +struct wget_timer { + /* The starting point in time which, subtracted from the current + time, yields elapsed time. */ + wget_sys_time start; + + /* The most recent elapsed time, calculated by wtimer_elapsed(). + Measured in milliseconds. */ + long elapsed_last; + + /* Approximately, the time elapsed between the true start of the + measurement and the time represented by START. */ + long elapsed_pre_start; }; /* Allocate a timer. It is not legal to do anything with a freshly @@ -1577,22 +1588,17 @@ xfree (wt); } -/* Reset timer WT. This establishes the starting point from which - wtimer_elapsed() will return the number of elapsed - milliseconds. It is allowed to reset a previously used timer. */ +/* Store system time to WST. */ -void -wtimer_reset (struct wget_timer *wt) +static void +wtimer_sys_set (wget_sys_time *wst) { #ifdef TIMER_GETTIMEOFDAY - struct timeval t; - gettimeofday (&t, NULL); - wt->secs = t.tv_sec; - wt->usecs = t.tv_usec; + gettimeofday (wst, NULL); #endif #ifdef TIMER_TIME - wt->secs = time (NULL); + time (wst); #endif #ifdef TIMER_WINDOWS @@ -1600,39 +1606,76 @@ SYSTEMTIME st; GetSystemTime (&st); SystemTimeToFileTime (&st, &ft); - wt->wintime.HighPart = ft.dwHighDateTime; - wt->wintime.LowPart = ft.dwLowDateTime; + wst->HighPart = ft.dwHighDateTime; + wst->LowPart = ft.dwLowDateTime; #endif } -/* Return the number of milliseconds elapsed since the timer was last - reset. It is allowed to call this function more than once to get - increasingly higher elapsed values. */ +/* Reset timer WT. This establishes the starting point from which + wtimer_elapsed() will return the number of elapsed + milliseconds. It is allowed to reset a previously used timer. */ -long -wtimer_elapsed (struct wget_timer *wt) +void +wtimer_reset (struct wget_timer *wt) { + /* Set the start time to the current time. */ + wtimer_sys_set (&wt->start); + wt->elapsed_last = 0; + wt->elapsed_pre_start = 0; +} + +static long +wtimer_sys_diff (wget_sys_time *wst1, wget_sys_time *wst2) +{ #ifdef TIMER_GETTIMEOFDAY - struct timeval t; - gettimeofday (&t, NULL); - return (t.tv_sec - wt->secs) * 1000 + (t.tv_usec - wt->usecs) / 1000; + return ((wst1->tv_sec - wst2->tv_sec) * 1000 + + (wst1->tv_usec - wst2->tv_usec) / 1000); #endif #ifdef TIMER_TIME - time_t now = time (NULL); - return 1000 * (now - wt->secs); + return 1000 * (*wst1 - *wst2); #endif #ifdef WINDOWS - FILETIME ft; - SYSTEMTIME st; - ULARGE_INTEGER uli; - GetSystemTime (&st); - SystemTimeToFileTime (&st, &ft); - uli.HighPart = ft.dwHighDateTime; - uli.LowPart = ft.dwLowDateTime; - return (long)((uli.QuadPart - wt->wintime.QuadPart) / 1); + return (long)(wst1->QuadPart - wst2->QuadPart) / 1; #endif +} + +/* R
bug in wget - wget break on time msec=0
Hello, I think I found a bug in wget. My GNU wget version is 1.82 My system GNU/Debian unstable I use wget to replay our apache logfiles to a test webserver to try different tuning parameters. Wget fails to run through the logfile and give out the error message that "msec >=0 failed". This is the command I run #time wget -q -i replaylog -O /dev/null Here is the output of strace #time strace wget -q -i replaylog -O /dev/null read(4, "HTTP/1.1 200 OK\r\nDate: Sat, 13 S"..., 4096) = 4096 write(3, "\377\330\377\340\0\20JFIF\0\1\1\1\0H\0H\0\0\377\354\0\21"..., 3792) = 3792 gettimeofday({1063461157, 858103}, NULL) = 0 select(5, [4], NULL, [4], {900, 0}) = 1 (in [4], left {900, 0}) read(4, "\377\0\344=\217\355V\\\232\363\16\221\255\336h\227\361"..., 1435) = 1435 write(3, "\377\0\344=\217\355V\\\232\363\16\221\255\336h\227\361"..., 1435) = 1435 gettimeofday({1063461157, 858783}, NULL) = 0 time(NULL) = 1063461157 access("390564.jpg?time=1060510404", F_OK) = -1 ENOENT (No such file or directory) time(NULL) = 1063461157 select(5, [4], NULL, NULL, {0, 1}) = 0 (Timeout) time(NULL) = 1063461157 select(5, NULL, [4], [4], {900, 0}) = 1 (out [4], left {900, 0}) write(4, "GET /fotos/4/390564.jpg?time=106"..., 244) = 244 select(5, [4], NULL, [4], {900, 0}) = 1 (in [4], left {900, 0}) read(4, "HTTP/1.1 200 OK\r\nDate: Sat, 13 S"..., 4096) = 4096 write(3, "\377\330\377\340\0\20JFIF\0\1\1\1\0H\0H\0\0\377\333\0C"..., 3792) = 3792 gettimeofday({1063461157, 880833}, NULL) = 0 select(5, [4], NULL, [4], {900, 0}) = 1 (in [4], left {900, 0}) read(4, "\343P\223\36T\4\203Rc\317\257J\4x\2165\303;o\211\256+\222"..., 817) = 817 write(3, "\343P\223\36T\4\203Rc\317\257J\4x\2165\303;o\211\256+\222"..., 817) = 817 gettimeofday({1063461157, 874729}, NULL) = 0 time(NULL) = 1063461157 write(2, "wget: retr.c:262: calc_rate: Ass"..., 60wget: retr.c:262: calc_rate: Assertion `msecs >= 0' failed. ) = 60 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 getpid()= 7106 kill(7106, SIGABRT) = 0 --- SIGABRT (Aborted) @ 0 (0) --- +++ killed by SIGABRT +++ I hope that help. Keep up the good work Kind regards Gunnar
Re: problem using wget in a script
Hi Tim! I very dimmly remember that sometimes you should use single quotes ' instead of ". Perhaps this already helps. CU Jens > Can anyone please advise me where I am going wrong. > > I use wget and must supply user agent string within a bash script as the > site author coded it to run under ie only. > > I use the string -U "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)" > > unfortunately for some reason wget attempts to resolve each component of > the agent string, > ie http://compatible > http://msie > > etc. > > it is as if the quotes surrounding the string where not there > > > but when I do the same from a command line everything works ok. > > > Tim > > -- COMPUTERBILD 15/03: Premium-e-mail-Dienste im Test -- 1. GMX TopMail - Platz 1 und Testsieger! 2. GMX ProMail - Platz 2 und Preis-Qualitätssieger! 3. Arcor - 4. web.de - 5. T-Online - 6. freenet.de - 7. daybyday - 8. e-Post
problem using wget in a script
Can anyone please advise me where I am going wrong. I use wget and must supply user agent string within a bash script as the site author coded it to run under ie only. I use the string -U "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)" unfortunately for some reason wget attempts to resolve each component of the agent string, ie http://compatible http://msie etc. it is as if the quotes surrounding the string where not there but when I do the same from a command line everything works ok. Tim