DeepVaccum

2003-09-13 Thread Aaron S. Hawley
[saw this on the web..]

HexCat Software DeepVaccum
http://www.hexcat.com/deepvaccum/

DeepVaccum is a donationware, useful web utility based on GNU wget command
line tool. Program includes a vast number of options to fine tune your
downloads through both http and ftp protocols.

DV enables you to download:  whole single pages, entire sites, ftp
catalogs, link lists from a text file, filtered types, ex. images.

-- 
 "Too much use of American power overseas causes the nation to look
  like 'the ugly American'." -- Gov. George W. Bush


Re: bug in wget - wget break on time msec=0

2003-09-13 Thread Hrvoje Niksic
"Boehn, Gunnar von" <[EMAIL PROTECTED]> writes:

> I think I found a bug in wget.

You did.  But I believe your subject line is slightly incorrect.  Wget
handles 0 length time intervals (see the assert message), but what it
doesn't handle are negative amounts.  And indeed:

> gettimeofday({1063461157, 858103}, NULL) = 0
> gettimeofday({1063461157, 858783}, NULL) = 0
> gettimeofday({1063461157, 880833}, NULL) = 0
> gettimeofday({1063461157, 874729}, NULL) = 0

As you can see, the last gettimeofday returned time *preceding* the
one before it.  Your ntp daemon must have chosen that precise moment
to set back the system clock by ~6 milliseconds, to which Wget reacted
badly.

Even so, Wget shouldn't crash.  The correct fix is to disallow the
timer code from ever returning decreasing or negative time intervals.
Please let me know if this patch fixes the problem:


2003-09-14  Hrvoje Niksic  <[EMAIL PROTECTED]>

* utils.c (wtimer_sys_set): Extracted the code that sets the
current time here.
(wtimer_reset): Call it.
(wtimer_sys_diff): Extracted the code that calculates the
difference between two system times here.
(wtimer_elapsed): Call it.
(wtimer_elapsed): Don't return a value smaller than the previous
one, which could previously happen when system time is set back.
Instead, reset start time to current time and note the elapsed
offset for future calculations.  The returned times are now
guaranteed to be monotonically nondecreasing.

Index: src/utils.c
===
RCS file: /pack/anoncvs/wget/src/utils.c,v
retrieving revision 1.51
diff -u -r1.51 utils.c
--- src/utils.c 2002/05/18 02:16:25 1.51
+++ src/utils.c 2003/09/13 23:09:13
@@ -1532,19 +1532,30 @@
 # endif
 #endif /* not WINDOWS */
 
-struct wget_timer {
 #ifdef TIMER_GETTIMEOFDAY
-  long secs;
-  long usecs;
+typedef struct timeval wget_sys_time;
 #endif
 
 #ifdef TIMER_TIME
-  time_t secs;
+typedef time_t wget_sys_time;
 #endif
 
 #ifdef TIMER_WINDOWS
-  ULARGE_INTEGER wintime;
+typedef ULARGE_INTEGER wget_sys_time;
 #endif
+
+struct wget_timer {
+  /* The starting point in time which, subtracted from the current
+ time, yields elapsed time. */
+  wget_sys_time start;
+
+  /* The most recent elapsed time, calculated by wtimer_elapsed().
+ Measured in milliseconds.  */
+  long elapsed_last;
+
+  /* Approximately, the time elapsed between the true start of the
+ measurement and the time represented by START.  */
+  long elapsed_pre_start;
 };
 
 /* Allocate a timer.  It is not legal to do anything with a freshly
@@ -1577,22 +1588,17 @@
   xfree (wt);
 }
 
-/* Reset timer WT.  This establishes the starting point from which
-   wtimer_elapsed() will return the number of elapsed
-   milliseconds.  It is allowed to reset a previously used timer.  */
+/* Store system time to WST.  */
 
-void
-wtimer_reset (struct wget_timer *wt)
+static void
+wtimer_sys_set (wget_sys_time *wst)
 {
 #ifdef TIMER_GETTIMEOFDAY
-  struct timeval t;
-  gettimeofday (&t, NULL);
-  wt->secs  = t.tv_sec;
-  wt->usecs = t.tv_usec;
+  gettimeofday (wst, NULL);
 #endif
 
 #ifdef TIMER_TIME
-  wt->secs = time (NULL);
+  time (wst);
 #endif
 
 #ifdef TIMER_WINDOWS
@@ -1600,39 +1606,76 @@
   SYSTEMTIME st;
   GetSystemTime (&st);
   SystemTimeToFileTime (&st, &ft);
-  wt->wintime.HighPart = ft.dwHighDateTime;
-  wt->wintime.LowPart  = ft.dwLowDateTime;
+  wst->HighPart = ft.dwHighDateTime;
+  wst->LowPart  = ft.dwLowDateTime;
 #endif
 }
 
-/* Return the number of milliseconds elapsed since the timer was last
-   reset.  It is allowed to call this function more than once to get
-   increasingly higher elapsed values.  */
+/* Reset timer WT.  This establishes the starting point from which
+   wtimer_elapsed() will return the number of elapsed
+   milliseconds.  It is allowed to reset a previously used timer.  */
 
-long
-wtimer_elapsed (struct wget_timer *wt)
+void
+wtimer_reset (struct wget_timer *wt)
 {
+  /* Set the start time to the current time. */
+  wtimer_sys_set (&wt->start);
+  wt->elapsed_last = 0;
+  wt->elapsed_pre_start = 0;
+}
+
+static long
+wtimer_sys_diff (wget_sys_time *wst1, wget_sys_time *wst2)
+{
 #ifdef TIMER_GETTIMEOFDAY
-  struct timeval t;
-  gettimeofday (&t, NULL);
-  return (t.tv_sec - wt->secs) * 1000 + (t.tv_usec - wt->usecs) / 1000;
+  return ((wst1->tv_sec - wst2->tv_sec) * 1000
+ + (wst1->tv_usec - wst2->tv_usec) / 1000);
 #endif
 
 #ifdef TIMER_TIME
-  time_t now = time (NULL);
-  return 1000 * (now - wt->secs);
+  return 1000 * (*wst1 - *wst2);
 #endif
 
 #ifdef WINDOWS
-  FILETIME ft;
-  SYSTEMTIME st;
-  ULARGE_INTEGER uli;
-  GetSystemTime (&st);
-  SystemTimeToFileTime (&st, &ft);
-  uli.HighPart = ft.dwHighDateTime;
-  uli.LowPart = ft.dwLowDateTime;
-  return (long)((uli.QuadPart - wt->wintime.QuadPart) / 1);
+  return (long)(wst1->QuadPart - wst2->QuadPart) / 1;
 #endif
+}
+
+/* R

bug in wget - wget break on time msec=0

2003-09-13 Thread Boehn, Gunnar von
Hello,


I think I found a bug in wget.

My GNU wget version is 1.82
My system GNU/Debian unstable


I use wget to replay our apache logfiles to a 
test webserver to try different tuning parameters.


Wget fails to run through the logfile
and give out the error message that "msec >=0 failed".

This is the command I run
#time wget -q -i replaylog -O /dev/null


Here is the output of strace
#time strace wget -q -i replaylog -O /dev/null

read(4, "HTTP/1.1 200 OK\r\nDate: Sat, 13 S"..., 4096) = 4096
write(3, "\377\330\377\340\0\20JFIF\0\1\1\1\0H\0H\0\0\377\354\0\21"...,
3792) = 3792
gettimeofday({1063461157, 858103}, NULL) = 0
select(5, [4], NULL, [4], {900, 0}) = 1 (in [4], left {900, 0})
read(4, "\377\0\344=\217\355V\\\232\363\16\221\255\336h\227\361"..., 1435) =
1435
write(3, "\377\0\344=\217\355V\\\232\363\16\221\255\336h\227\361"..., 1435)
= 1435
gettimeofday({1063461157, 858783}, NULL) = 0
time(NULL)  = 1063461157
access("390564.jpg?time=1060510404", F_OK) = -1 ENOENT (No such file or
directory)
time(NULL)  = 1063461157
select(5, [4], NULL, NULL, {0, 1})  = 0 (Timeout)
time(NULL)  = 1063461157
select(5, NULL, [4], [4], {900, 0}) = 1 (out [4], left {900, 0})
write(4, "GET /fotos/4/390564.jpg?time=106"..., 244) = 244
select(5, [4], NULL, [4], {900, 0}) = 1 (in [4], left {900, 0})
read(4, "HTTP/1.1 200 OK\r\nDate: Sat, 13 S"..., 4096) = 4096
write(3, "\377\330\377\340\0\20JFIF\0\1\1\1\0H\0H\0\0\377\333\0C"..., 3792)
= 3792
gettimeofday({1063461157, 880833}, NULL) = 0
select(5, [4], NULL, [4], {900, 0}) = 1 (in [4], left {900, 0})
read(4, "\343P\223\36T\4\203Rc\317\257J\4x\2165\303;o\211\256+\222"..., 817)
= 817
write(3, "\343P\223\36T\4\203Rc\317\257J\4x\2165\303;o\211\256+\222"...,
817) = 817
gettimeofday({1063461157, 874729}, NULL) = 0
time(NULL)  = 1063461157
write(2, "wget: retr.c:262: calc_rate: Ass"..., 60wget: retr.c:262:
calc_rate: Assertion `msecs >=
 0' failed.
) = 60
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
getpid()= 7106
kill(7106, SIGABRT) = 0
--- SIGABRT (Aborted) @ 0 (0) ---
+++ killed by SIGABRT +++


I hope that help.
Keep up the good work

Kind regards

Gunnar


Re: problem using wget in a script

2003-09-13 Thread lukasdl
Hi Tim!

I very dimmly remember that sometimes you should use 
single quotes ' instead of ". 
Perhaps this already helps.

CU
Jens


> Can anyone please advise me where I am going wrong.
> 
> I use wget and must supply user agent string within a bash script as the
> site author coded it to run under ie only.
> 
> I use the string -U "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)"
> 
> unfortunately  for some reason wget attempts to resolve each component of
> the agent string,
> ie http://compatible
> http://msie
> 
> etc.
> 
> it is as if the quotes surrounding the string where not there
> 
> 
> but when I do the same  from a command line everything works ok.
> 
> 
> Tim
> 
> 

-- 
COMPUTERBILD 15/03: Premium-e-mail-Dienste im Test
--
1. GMX TopMail - Platz 1 und Testsieger!
2. GMX ProMail - Platz 2 und Preis-Qualitätssieger!
3. Arcor - 4. web.de - 5. T-Online - 6. freenet.de - 7. daybyday - 8. e-Post



problem using wget in a script

2003-09-13 Thread tim wilkinson
Can anyone please advise me where I am going wrong.

I use wget and must supply user agent string within a bash script as the
site author coded it to run under ie only.

I use the string -U "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)"

unfortunately  for some reason wget attempts to resolve each component of
the agent string,
ie http://compatible
http://msie

etc.

it is as if the quotes surrounding the string where not there


but when I do the same  from a command line everything works ok.


Tim