Re: wget-cvs-ifmodsince.patch

2004-02-16 Thread Craig Sowadski
Ok, I have attached a new patch that moves the local time into http_stat. I 
am also sending this to [EMAIL PROTECTED] for others to try out. It seems to 
work great for me.

wget-cvs-ifmodsince.patch

ChangeLog:  Craig Sowadski <[EMAIL PROTECTED]>

* http.c (If-Modified-Since): Implemented use of
'If-Modified-Since' header instead of checking
'Last-Modified' durring the head-only request.
Description:
   This patch modifies the time-stamping method by only
   comparing local and remote file sizes, and then using
   the 'If-Modified-Since' header durring the request.
Craig Sowadski <[EMAIL PROTECTED]>
From: Hrvoje Niksic <[EMAIL PROTECTED]>
To: "Craig Sowadski" <[EMAIL PROTECTED]>
CC: [EMAIL PROTECTED]
Subject: Re: wget-cvs-ifmodsince.patch
Date: Thu, 12 Feb 2004 19:01:06 +0100
The patch looks good, thanks.  You might want to put the local time to
`struct http_stat' (where other details lie), so that the number of
arguments to gethttp doesn't multiply.
Would you agree to post the patch to the list at <[EMAIL PROTECTED]>, so
that other people can try it out?
_
Get fast, reliable access with MSN 9 Dial-up. Click here for Special Offer! 
http://click.atdmt.com/AVE/go/onm00200361ave/direct/01/


wget-cvs-ifmodsince.patch
Description: Binary data


Help: No such file or directory

2004-02-16 Thread Paul Kwok




I am a new user of Wget on Windows. The version I use is 1.9.1 running of
Windows XP.
I can download a file from a remote ftp server via Microsoft Internet
Explorer with the command:

ftp://user_name:[EMAIL PROTECTED]/full_file_path

However, I want to make my file transfer process automatic and I select to
use wget so that the
process can be started from a script.

The command I use :
   wget
ftp://user_name:[EMAIL PROTECTED]/full_file_path
The result is:
Winsock error: 10060
failed: No such file or directory

I tried also to add the following options:
   --proxy=off  --passive-ftp
   --proxy=on  --passsive-ftp
   --proxy=on
   --proxy=off
 The result is:
   failed: No such file or directory (the Winsock error does
not appear)

Can any one help?  (I am not subscribed yet, please cc me when reply.
Thanks a lot)


Paul Kwok



Re: Socks proxy?

2004-02-16 Thread Hrvoje Niksic
The SOCKS support was added to Wget at a very early date and was
unmaintained for a long time, up to the point where it wouldn't build
at all.  Since I didn't have the SOCKS library installed and noone
even reported the failures, I decided to remove the `--with-socks'
option from configure until someone stepped up to add back the
support.

In the meantime, the SOCKS library itself changed and porting new
applications to use it became much simpler than it used to be.  If you
have the time, visit
http://www.socks.permeo.com/TechnicalResources/SOCKSFAQ/SOCKSGeneralFAQ/HowtoSocksifyClients.asp
and see if the listed steps work with Wget.

As far as I can tell, Wget is SOCKS-friendly, according to guidelines
at
http://www.socks.permeo.com/TechnicalResources/DevelopDocuments/SOCKSFReferenceImpl120C.asp



Socks proxy?

2004-02-16 Thread H. Hernan Moraldo
I'm not subscribed to the list, please CC: your replies to my mail
address.

I've been using Wget with an http proxy that doesn't support resuming
(Proxy+), so I wanted to configure it for working with SOCKS but it
doesn't seem to have that feature.

I have a .wgetrc file on my HOME that says:

http_proxy = http://10.0.0.3:1080/
ftp_proxy = http://10.0.0.3:1080/

(Socks proxy is waiting in the 1080 port).

While it used to work when it was pointing to the HTTP proxy:

http_proxy = http://10.0.0.3:4480/
ftp_proxy = http://10.0.0.3:4480/

Now it points to the socks one it fails at recognizing the headers when
connected to the proxy. I also tried downloading the WgetPro sources and
compiling those with --with-socks, but that didn't work that way either.

As a matter of fact, if I do

 cd src; grep -i socks *

I only get:

config.h:/* Define if you wish to compile with socks support.  */
config.h:#define HAVE_SOCKS 1
config.h.in:/* Define if you wish to compile with socks support.  */
config.h.in:#undef HAVE_SOCKS
Coincidencia en el fichero binario ftp-opie.o
Coincidencia en el fichero binario wpro

Which for me is crazy since it doesn't seem to use the HAVE_SOCKS
variable at all, then I wonder how it could have socks support that way.

I'm sure I'm doing something wrong, could anyone please tell me what?

Thanks in advance.

Best regards,

-- 

H. HernĂ¡n Moraldo
Moraldo Games
http://games.moraldo.com.ar/



Re: Robots = off directive

2004-02-16 Thread Hrvoje Niksic
patrick robinson <[EMAIL PROTECTED]> writes:

>> That message has nothing to do with robots.txt, it means that you
>> have rejected the file using the `-R' or equivalent option.
>
> Here you go again with this IMHO stupid implemented option.

Why thank you.

> I'm using it too but on some suffixes it acts after downloading by
> deleting the already downloaded file and on other suffixes it works
> in advance.

It works /a posteriori/ on HTML documents because they need to be
downloaded to be examined for links.  Otherwise something like `wget
-r -A jpg URL' would not download anything because the index item is
not an image.

> But I'm only using version 1.8.2 maybe it has been changed in more
> recent versions by now.

Many things have improved since 1.8.2.  I recommend upgrade.


Re: Robots = off directive

2004-02-16 Thread patrick robinson
Hello Hrvoje,

On 16-Feb-04, you wrote:

> "chatiman" <[EMAIL PROTECTED]> writes:

>> I'm trying to download a robots.txt protexted directory and I'm having the
>> following problem:
>> 
>> - wget downloads the files but delete them after they are downloaded with
>> the following :message (translated from french):
>> Destroyed  because it must be rejected

> That message has nothing to do with robots.txt, it means that you have
> rejected the file using the `-R' or equivalent option.

Here you go again with this IMHO stupid implemented option.
I'm using it too but on some suffixes it acts after downloading by deleting
the already downloaded file and on other suffixes it works in advance.

I wonder why it not always rejects predefined files/suffixes in advance.
I doesn't make much sense to download them and then delete them.

But I'm only using version 1.8.2 maybe it has been changed in more recent
versions by now.




Regards
Patrick Robinson




Re: Startup delay on Windows

2004-02-16 Thread David Fritz
I'd be content with the following logic:

Don't process a `system' wgetrc. If $HOME is not defined, use the 
directory the Wget executable is in as $HOME (what home_dir() returns).
If $HOME/.wgetrc exists, use that; otherwise look for wget.ini in the 
directory the executable is in, regardless of $HOME.

We would retain wget.ini support for backward compatibility, and support 
.wgetrc for consistency with other platforms and with the handling of 
.netrc.  This would only break things if people had $HOME defined and it 
contained a .wgetrc and they expected the Windows port to ignore it.

As a side-effect, this would also resolve the above issue.

I went ahead and implemented this.  I figure at least it will work as an interim 
solution.

2004-02-16  David Fritz  <[EMAIL PROTECTED]>

* init.c (home_dir): Use aprintf() instead of xmalloc()/sprintf().
Under Windows, if $HOME is not defined, use the directory that
contains the Wget binary instead of hard-coded `C:\'.
(wgetrc_file_name): Under Windows, look for $HOME/.wgetrc then, if
not found, look for wget.ini in the directory of the Wget binary.
* mswindows.c (ws_mypath): Employ slightly more robust methodology.
Strip trailing path separator.

Index: src/init.c
===
RCS file: /pack/anoncvs/wget/src/init.c,v
retrieving revision 1.91
diff -u -r1.91 init.c
--- src/init.c  2003/12/14 13:35:27 1.91
+++ src/init.c  2004/02/16 15:58:36
@@ -1,5 +1,5 @@
 /* Reading/parsing the initialization file.
-   Copyright (C) 1995, 1996, 1997, 1998, 2000, 2001, 2003
+   Copyright (C) 1995, 1996, 1997, 1998, 2000, 2001, 2003, 2004
Free Software Foundation, Inc.
 
 This file is part of GNU Wget.
@@ -314,9 +314,9 @@
return NULL;
   home = pwd->pw_dir;
 #else  /* WINDOWS */
-  home = "C:\\";
-  /*  Maybe I should grab home_dir from registry, but the best
-that I could get from there is user's Start menu.  It sucks!  */
+  /* Under Windows, if $HOME isn't defined, use the directory where
+ `wget.exe' resides.  */
+  home = ws_mypath ();
 #endif /* WINDOWS */
 }
 
@@ -347,27 +347,24 @@
   return xstrdup (env);
 }
 
-#ifndef WINDOWS
   /* If that failed, try $HOME/.wgetrc.  */
   home = home_dir ();
   if (home)
-{
-  file = (char *)xmalloc (strlen (home) + 1 + strlen (".wgetrc") + 1);
-  sprintf (file, "%s/.wgetrc", home);
-}
+file = aprintf ("%s/.wgetrc", home);
   xfree_null (home);
-#else  /* WINDOWS */
-  /* Under Windows, "home" is (for the purposes of this function) the
- directory where `wget.exe' resides, and `wget.ini' will be used
- as file name.  SYSTEM_WGETRC should not be defined under WINDOWS.
-
- It is not as trivial as I assumed, because on 95 argv[0] is full
- path, but on NT you get what you typed in command line.  --dbudor */
-  home = ws_mypath ();
-  if (home)
+
+#ifdef WINDOWS
+  /* Under Windows, if we still haven't found .wgetrc, look for the file
+ `wget.ini' in the directory where `wget.exe' resides; we do this for
+ backward compatibility with previous versions of Wget.
+ SYSTEM_WGETRC should not be defined under WINDOWS.  */
+  if (!file || !file_exists_p (file))
 {
-  file = (char *)xmalloc (strlen (home) + strlen ("wget.ini") + 1);
-  sprintf (file, "%swget.ini", home);
+  xfree_null (file);
+  file = NULL;
+  home = ws_mypath ();
+  if (home)
+   file = aprintf ("%s/wget.ini", home);
 }
 #endif /* WINDOWS */
 
Index: src/mswindows.c
===
RCS file: /pack/anoncvs/wget/src/mswindows.c,v
retrieving revision 1.22
diff -u -r1.22 mswindows.c
--- src/mswindows.c 2003/11/03 21:57:03 1.22
+++ src/mswindows.c 2004/02/16 15:58:37
@@ -1,5 +1,5 @@
 /* mswindows.c -- Windows-specific support
-   Copyright (C) 1995, 1996, 1997, 1998  Free Software Foundation, Inc.
+   Copyright (C) 1995, 1996, 1997, 1998, 2004  Free Software Foundation, Inc.
 
 This file is part of GNU Wget.
 
@@ -199,22 +199,25 @@
 ws_mypath (void)
 {
   static char *wspathsave = NULL;
-  char buffer[MAX_PATH];
-  char *ptr;
 
-  if (wspathsave)
+  if (!wspathsave)
 {
-  return wspathsave;
-}
+  char buf[MAX_PATH + 1];
+  char *p;
+  DWORD len;
+
+  len = GetModuleFileName (GetModuleHandle (NULL), buf, sizeof (buf));
+  if (!len || (len >= sizeof (buf)))
+return NULL;
+
+  p = strrchr (buf, PATH_SEPARATOR);
+  if (!p)
+return NULL;
 
-  if (GetModuleFileName (NULL, buffer, MAX_PATH) &&
-  (ptr = strrchr (buffer, PATH_SEPARATOR)) != NULL)
-{
-  *(ptr + 1) = '\0';
-  wspathsave = xstrdup (buffer);
+  *p = '\0';
+  wspathsave = xstrdup (buf);
 }
-  else
-wspathsave = NULL;
+
   return wspathsave;
 }
 


Re: Robots = off directive

2004-02-16 Thread Hrvoje Niksic
"chatiman" <[EMAIL PROTECTED]> writes:

> I'm trying to download a robots.txt protexted directory and I'm having the
> following problem:
>
> - wget downloads the files but delete them after they are downloaded with
> the following :message (translated from french):
> Destroyed  because it must be rejected

That message has nothing to do with robots.txt, it means that you have
rejected the file using the `-R' or equivalent option.


Robots = off directive

2004-02-16 Thread chatiman
Hello,


I'm trying to download a robots.txt protexted directory and I'm having the
following problem:

- wget downloads the files but delete them after they are downloaded with
the following :message (translated from french):
Destroyed  because it must be rejected

How can I prevent this ?

Thanks

PS: I'm using wget 1.8.1-6




RE: delete-before switch

2004-02-16 Thread Herold Heiko
[resubmitted to wget@ instead of wget-patches]

> From: Rupert Levene [mailto:[EMAIL PROTECTED]
> 
..
 
> My vote: keep the option for either behaviour :-) As written, the
> patch only changes behaviour if the --timestamping and 
--delete-before
> options are in effect.
> 
> Rupert

I understand that you want that feature for your own special 
needs, on the
other hand there is Hrvoje's (more than reasonable!) desire 
to avoid option
proliferation and creeping featuritis.
So why not a more general option - you could code a 
run-external-command
feature before and after downloading a file, passing a number 
of arguments.
Something like

 BEFORE [LOC=location, url] [SAVE_PATH=path where 
the file will be
saved] [REF=possibly referring url] [ORG_SIZE=...] [STARTTIME=] ...

then download, followed by

 AFTER SUCCESS|FAILURE [NUM_ATTEMPTS=..]
[ERRTYPE=TIMEOUT|MAX_ATTEMPTS|NOT_RESOLVED] [FINAL_SIZE=] 
[USERTIME=...]
[EFFECTIVETIME=usertime except the retry waiting periods] ...

just as an example of syntax and parameters, probably 
somebody could come up
with a better syntax, possibly some other interesting data could be
gathered. Possibly the data could be passed in the 
environment instead of
arguments (this would avoid the need for getopts or string 
operations for
simple shell scripts).

This would solve a whole lot of wanted features with just one 
option, for
example from time to time somebody wants to know how to get 
an exact list of
downloaded files, currently the log must be parsed or 
something similar.
You would just write a small script in order to unlink the 
SAVE_PATH file
and run wget --run-before=dounlink.pl or whatever.

I suppose for a starter just basic data already available (url, path &
filename, SUCCESS|FAILURE) would contain the amount of work 
needed for this.

Hrvoje, what do you think about this ? Acceptable ? Horrible ?

Heiko

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED]
-- +39-041-5907073 ph
-- +39-041-5907472 fax