Bug-report: wget with multiple cnames in ssl certificate

2007-04-12 Thread Alex Antener
Hi

If i connect with wget 1.10.2 (Debian Etch & Ubuntu Feisty Fawn) to a
secure host, that uses multiple cnames in the certificate i get the
following error:

[EMAIL PROTECTED]:~$ wget https://host.domain.tld
--10:18:55--  https://host.domain.tld/
   => `index.html'
Resolving host.domain.tld... xxx.xxx.xxx.xxx
Connecting to host.domain.tld|xxx.xxx.xxx.xxx|:443... connected.
ERROR: certificate common name `host0.domain.tld' doesn't match
requested host name `host.domain.tld'.
To connect to host.domain.tld insecurely, use `--no-check-certificate'.
Unable to establish SSL connection.

If I do the same with wget 1.9.1 (Debian Sarge) I do not get that Error.

Kind regards, Alex Antener

-- 
Alex Antener
Dipl. Medienkuenstler FH

[EMAIL PROTECTED] // http://lix.cc // +41 (0)44 586 97 63
GPG Key: 1024D/14D3C7A1 https://lix.cc/gpg_key.php
Fingerprint: BAB6 E61B 17D7 A9C9 6313  5141 3A3C DAA3 14D3 C7A1



Bug report: backup files missing when using "wget -K"

2006-08-14 Thread Ken Kubota

Hi,

when calling "wget -k -K ", the backup files (.orig) are missing.

In one case (LOG.Linux.short) one backup file is missing (two files were 
converted).
In another case (LOG.IRIX64.short) all backup files are missing.
This is also true when using recursive retrieval (LOG.IRIX64.recursive.short).

See attached files for details. The script calling wget is "WGET". There was no 
".wgetrc" file.

You probably know the bug described at: 
http://www.mail-archive.com/wget@sunsite.dk/msg07686.html
Remove the two "./CLEAN" commands in the script to test recursive re-download 
with it.
I cannot reproduce the bug since the backup files are missing.

Kind regards,

Ken



LOG.Linux.short:
DEBUG output created by Wget 1.8.2 on linux.
Linux linux 2.4.21-99-default #1 Wed Sep 24 13:30:51 UTC 2003 i686 i686 i386 
GNU/Linux
2 Dateien in 0.07 Sekunden konvertiert.
Backup files:  1

LOG.IRIX64.short:
DEBUG output created by Wget 1.10.1 on irix6.5.
IRIX64 Komma 6.5 07010238 IP27
Converted 2 files in 0.204 seconds.
Backup files: 0

LOG.IRIX64.recursive.short:
DEBUG output created by Wget 1.10.1 on irix6.5.
IRIX64 Komma 6.5 07010238 IP27
Converted 55 files in 7.616 seconds.
Backup files: 0

wget-bug.tar.bz2
Description: Binary data


Re: Bug report

2006-04-01 Thread Frank McCown

Gary Reysa wrote:

Hi,

I don't really know if this is a Wget bug, or some problem with my 
website, but, either way, maybe you can help.


I have a web site ( www.BuildItSolar.com ) with perhaps a few hundred 
pages (260MB of storage total).  Someone did a Wget on my site, and 
managed to log 111,000 hits and 58,000 page views (using more than a GB 
of bandwidth).


I am wondering how this can happen, since the number of page views is 
about 200 times the number of pages on my site??


Is there something I can do to prevent this?  Is there something about 
the organization of my website that is causing Wget to get stuck in a loop?


I've never used Wget, but I am guessing that this guy really did not 
want 50,000+ pages -- do you provide some way for the user to shut 
itself down when it reaches some reasonable limit?


My website is non-commercial, and provides a lot of information that 
people find useful in building renewable energy projects.  It generates 
zero income, and I can't really afford to have a lot of people come in 
and burn up GBs of bandwidth to no useful end.  Help!


Gary Reysa


Bozeman, MT
[EMAIL PROTECTED]



Hello Gary,

From a quick look at your site, it appears to be mainly static html 
that would not generate a lot of extra crawls.  If you have some dynamic 
portion of your site, like a calendar, that could make wget go into an 
infinite loop.  It would be much easier to tell if you could look at the 
server logs that show what pages were requested.  They would easily tell 
you want wget was getting hung on.


One problem I did notice is that your site is generating "soft 404s". 
In other words, it is sending back a http 200 response when it should be 
sending back a 404 response.  So if wget tries to access


http://www.builditsolar.com/blah

your web server is telling wget that the page actually exists.  This 
*could* cause more crawls than necessary, but not likely.  This problem 
should be fixed though.


It's possible the wget user did not know what they were doing and ran 
the crawler several times.  You could try to block traffic from that 
particular IP address or create a robots.txt file that tells crawlers to 
stay away from your site or just certain pages.  Wget respects 
robots.txt.  For more info:


http://www.robotstxt.org/wc/robots.html

Regards,
Frank



Bug report

2006-04-01 Thread Gary Reysa

Hi,

I don't really know if this is a Wget bug, or some problem with my 
website, but, either way, maybe you can help.


I have a web site ( www.BuildItSolar.com ) with perhaps a few hundred 
pages (260MB of storage total).  Someone did a Wget on my site, and 
managed to log 111,000 hits and 58,000 page views (using more than a GB 
of bandwidth).


I am wondering how this can happen, since the number of page views is 
about 200 times the number of pages on my site??


Is there something I can do to prevent this?  Is there something about 
the organization of my website that is causing Wget to get stuck in a loop?


I've never used Wget, but I am guessing that this guy really did not 
want 50,000+ pages -- do you provide some way for the user to shut 
itself down when it reaches some reasonable limit?


My website is non-commercial, and provides a lot of information that 
people find useful in building renewable energy projects.  It generates 
zero income, and I can't really afford to have a lot of people come in 
and burn up GBs of bandwidth to no useful end.  Help!


Gary Reysa


Bozeman, MT
[EMAIL PROTECTED]







Re: Bug report: option -nr

2005-06-30 Thread Hrvoje Niksic
Marc Niederwieser <[EMAIL PROTECTED]> writes:

> option --mirror is described as
>   shortcut option equivalent to -r -N -l inf -nr.
> but option "-nr" is not implemented.
> I think you mean "--no-remove-listing".

Thanks for the report, I've now fixed the --help text.

2005-07-01  Hrvoje Niksic  <[EMAIL PROTECTED]>

* main.c (print_help): Don't refer to the non-existent -nr in
description of --mirror.

Index: src/main.c
===
--- src/main.c  (revision 1918)
+++ src/main.c  (working copy)
@@ -575,7 +575,7 @@
 N_("\
   -K,  --backup-converted   before converting file X, back up as X.orig.\n"),
 N_("\
-  -m,  --mirror shortcut option equivalent to -r -N -l inf 
-nr.\n"),
+  -m,  --mirror shortcut for -N -r -l inf --no-remove-listing.\n"),
 N_("\
   -p,  --page-requisitesget all images, etc. needed to display HTML 
page.\n"),
 N_("\


Bug report: option -nr

2005-06-30 Thread Marc Niederwieser
Hi

option --mirror is described as
  shortcut option equivalent to -r -N -l inf -nr.
but option "-nr" is not implemented.
I think you mean "--no-remove-listing".

greetings 
Marc



Re: wget bug report

2005-06-24 Thread Hrvoje Niksic
<[EMAIL PROTECTED]> writes:

> Sorry for the crosspost, but the wget Web site is a little confusing
> on the point of where to send bug reports/patches.

Sorry about that.  In this case, either address is fine, and we don't
mind the crosspost.

> After taking a look at it, i implemented the following change to
> http.c and tried again. It works for me, but i don't know what other
> implications my change might have.

It's exactly the correct change.  A similar fix has already been
integrated in the CVS (in fact subversion) code base.

Thanks for the report and the patch.


wget bug report

2005-06-12 Thread A.Jones
Sorry for the crosspost, but the wget Web site is a little confusing on the 
point of where to send bug reports/patches.

Just installed wget 1.10 on Friday. Over the weekend, my scripts failed with 
the 
following error (once for each wget run):
Assertion failed: wget_cookie_jar != NULL, file http.c, line 1723
Abort - core dumped

All of my command lines are similar to this:
/home/programs/bin/wget -q --no-cache --no-cookies -O /home/programs/etc/alte_se
iten/xsr.html 'http://www.enterasys.com/download/download.cgi?lib=XSR'

After taking a look at it, i implemented the following change to http.c and 
tried again. It works for me, but i don't know what other implications my 
change 
might have.

--- http.c.orig Mon Jun 13 08:04:23 2005
+++ http.c  Mon Jun 13 08:06:59 2005
@@ -1715,6 +1715,7 @@
   hs->remote_time = resp_header_strdup (resp, "Last-Modified");
 
   /* Handle (possibly multiple instances of) the Set-Cookie header. */
+  if (opt.cookies)
   {
 char *pth = NULL;
 int scpos;


Mit freundlichen Grüßen

MVV Energie AG
Abteilung AI.C

Andrew Jones

Telefon: +49 621 290-3645
Fax: +49 621 290-2677
E-Mail: [EMAIL PROTECTED] Internet: www.mvv.de
MVV Energie · Luisenring 49 · 68159 Mannheim
Handelsregister-Nr. HRB 1780
Vorsitzender des Aufsichtsrates: Oberbürgermeister Gerhard Widder
Vorstand: Dr. Rudolf Schulten (Vorsitzender) · Dr. Werner Dub · Hans-Jürgen 
Farrenkopf · Karl-Heinz Trautmann


Bug report: two spaces between filesize and Month

2004-05-03 Thread Iztok Saje
Hello!
I just found a "feature" in embedded system (no source) with ftp server.
In listing, there are two spaces between fileize and month.
As a consequence, wget allways thinks size is 0.
In procedure ftp_parse_unix_ls  it just steps back one blank
before cur.size is calculated.
My quick hack is just to add one more pointer and atoi,
but maybe a nicer sollution can be done.
case from .listing:
-rw-rw-rw-   0 0  0  68065  Apr 16 08:00 A20040416.0745
-rw-rw-rw-   0 0  0781  Apr 20 07:45 A20040420.0730
-rw-rw-rw-   0 0  0  59606  Apr 16 08:15 A20040416.0800
-rw-rw-rw-   0 0  0781  Apr 23 12:15 A20040423.1200
-rw-rw-rw-   0 0  0   2130  Feb  3 12:00 A20040203.1145
-rw-rw-rw-   0 0  0  33440  Apr 14 12:15 A20040414.1200
BR
Iztok


wget bug report

2004-03-26 Thread Corey Henderson
I sent this message to [EMAIL PROTECTED] as directed in the wget man page, but it 
bounced and said to try this email address.

This bug report is for GNU Wget 1.8.2 tested on both RedHat Linux 7.3 and 9

rpm -q wget
wget-1.8.2-9

When I use a wget with the -S to show the http headers, and I use the spider switch as 
well, it gives me a 501 error on some servers.

The main example I have found was doing it against a server running ntop.

http://www.ntop.org/

You can find an RPM for it at:

http://rpm.pbone.net/index.php3/stat/4/idpl/586625/com/ntop-2.2-0.dag.rh90.i386.rpm.html

You cean search with other parameters at rpm.pbone.net to get ntop for other version 
of linux

So here is the command and output:

wget -S --spider http://SERVER_WITH_NTOP:3000

HTTP request sent, awaiting response...
 1 HTTP/1.0 501 Not Implemented
 2 Date: Sat, 27 Mar 2004 07:08:24 GMT
 3 Cache-Control: no-cache
 4 Expires: 0
 5 Connection: close
 6 Server: ntop/2.2 (Dag Apt RPM Repository) (i686-pc-linux-gnu)
 7 Content-Type: text/html
21:11:56 ERROR 501: Not Implemented.

I get a 501 error. echoing the $? shows an exit status of 1

When I don't use the spider, I get the following:

wget -S http://SERVER_WITH_NTOP:3000

HTTP request sent, awaiting response...
 1 HTTP/1.0 200 OK
 2 Date: Sat, 27 Mar 2004 07:09:31 GMT
 3 Cache-Control: max-age=3600, must-revalidate, public
 4 Connection: close
 5 Server: ntop/2.2 (Dag Apt RPM Repository) (i686-pc-linux-gnu)
 6 Content-Type: text/html
 7 Last-Modified: Mon, 17 Mar 2003 20:27:49 GMT
 8 Accept-Ranges: bytes
 9 Content-Length: 1214

100%[==>]
 1,214  1.16M/sETA 00:00

21:13:04 (1.16 MB/s) - `index.html' saved [1214/1214]



The exit status was 0 and the index.html file was downloaded.

If this is a bug please fix it in your next release of wget. If it is not a bug, I 
would appriciate a brief explination as to why.

Thank You

Corey Henderson
Chief Programmer
GlobalHost.com

Re: Bug report

2004-03-24 Thread Hrvoje Niksic
Juhana Sadeharju <[EMAIL PROTECTED]> writes:

> Command: "wgetdir http://liarliar.sourceforge.net";.
> Problem: Files are named as
>   content.php?content.2
>   content.php?content.3
>   content.php?content.4
> which are interpreted, e.g., by Nautilus as manual pages and are
> displayed as plain texts. Could the files and the links to them
> renamed as the following?
>   content.php?content.2.html
>   content.php?content.3.html
>   content.php?content.4.html

Use the option `--html-extension' (-E).

> After all, are those pages still php files or generated html files?
> If they are html files produced by the php files, then it could be a
> good idea to add a new extension to the files.

They're the latter -- HTML files produced by the server-side PHP code.

> Command: "wgetdir 
> http://www.newtek.com/products/lightwave/developer/lscript2.6/index.html";
> Problem: Images are not downloaded. Perhaps because the image links
> are the following:
>   

I've never seen this tag, but it seems to be the same as IMG.  Mozilla
seems to grok it and its DOM inspector thinks it has seen IMG.  Is
this tag documented anywhere?  Does IE understand it too?



Bug report

2004-03-24 Thread Juhana Sadeharju
Hello. This is report on some wget bugs. My wgetdir command looks
the following (wget 1.9.1):
wget -k --proxy=off -e robots=off --passive-ftp -q -r -l 0 -np -U Mozilla $@

Bugs:

Command: "wgetdir http://www.directfb.org";.
Problem: In file "www.directfb.org/index.html" the hrefs of type
  "/screenshots/index.xml" was not converted to relative
  with "-k" option.

Command: "wgetdir http://threedom.sourceforge.net";.
Problem: In file "threedom.sourceforge.net/index.html" the
hrefs were not converted to relative with "-k" option.

Command: "wgetdir http://liarliar.sourceforge.net";.
Problem: Files are named as
  content.php?content.2
  content.php?content.3
  content.php?content.4
which are interpreted, e.g., by Nautilus as manual pages and are
displayed as plain texts. Could the files and the links to them
renamed as the following?
  content.php?content.2.html
  content.php?content.3.html
  content.php?content.4.html
After all, are those pages still php files or generated html files?
If they are html files produced by the php files, then it could
be a good idea to add a new extension to the files.

Command: "wgetdir 
http://www.newtek.com/products/lightwave/developer/lscript2.6/index.html";
Problem: Images are not downloaded. Perhaps because the image links
are the following:
  

Regards,
Juhana


Re: bug report

2004-01-28 Thread Hrvoje Niksic
You are right, it's a bug.  -O is implemented in a weird way, which
makes it work strangely with features such as timestamping and link
conversion.  I plan to fix it when I get around to revamping the file
name generation support for grokking the Content-Disposition header.


bug report

2003-12-30 Thread Vlada Macek

Hi again,

I found something what can be called a bug.

The command line and the output (shortened):

$ wget -k www.seznam.cz
--14:14:28--  http://www.seznam.cz/
   => `index.html'
Resolving www.seznam.cz... done.
Connecting to www.seznam.cz[212.80.76.18]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

[ <=> ] 19,975 3.17M/s

14:14:28 (3.17 MB/s) - `index.html' saved [19975]

Converting index.html... 5-123
Converted 1 files in 0.01 seconds.

---
That is, newly created file is really link-converted.

Now I run:

$ wget -k -O myfile www.seznam.cz
--14:16:07--  http://www.seznam.cz/
   => `myfile'
Resolving www.seznam.cz... done.
Connecting to www.seznam.cz[212.80.76.3]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

[ <=> ] 19,980 3.18M/s

14:16:07 (3.18 MB/s) - `myfile' saved [19980]

index.html.1: No such file or directory
Converting index.html.1... nothing to do.
Converted 1 files in 0.00 seconds.

---
Now myfile is created and then wget tries to convert index.html.1, i.e. the 
file it normally *would* create if there was no -O option... 

When I wish the content to be sent to stdout (-O -), this postponed 
converting function is run again on index.html.1. Which is totally wrong, all 
content has been sent out to stdout already.

Not only my content is not link-converted. Is not here a possibility, that
wget can inadvertently garble files on disk it has nothing to do with?

Vlada Macek




bug report : 302 server response forces host spanning even without-H

2003-04-02 Thread Yaniv Azriel
If wget recieves a 302 TEmporarily Moved redirection to *another site*,
this site is browsed !
wget -r http://original/index.html

Server reply 302 http://redirect/index.html

WGET goes and downloads from "redirect" 



I also tried adding -D flag but it doesnt help

wget -r -Doriginal -nh http://original/

WGET still browses the redirect site

And by the way - multiple dependcy files are downloaded from the redirect
site - so this is a mojor bug i think




bug report

2003-02-22 Thread Jirka Klaue

1/   (serious)
#include  needs to be replaced by #include "config.h" in several source 
files.
The same applies to .

2/
#ifdef WINDOWS should be replaced by #ifdef _WIN32.

With these two changes it is even possible to compile wget with MSVC[++] and Intel 
C[++].   :-)

Jirka






bug report about running wget in BSDI 3.1

2003-02-05 Thread julian yin
Hello,

I'v downloaded wget-1.5.3 from http://ftp.gnu.org/gnu/wget into our 
BSDI version 3.1 OS and used following commands:

% gunzip wget-1.5.3.tar.gz
% tar -xvf wget-1.5.3.tar
% cd wget-1.5.3
% ./configure
% ./make -f Makefile
% ./make install

But the following error message was displayed:

--12:53:33--  http://www.osdpd.noaa.gov:80/COB/poltbus.asc
   => `poltbus.asc'
Connecting to www.osdpd.noaa.gov:80...
www.osdpd.noaa.gov: Host not found.

when I ran 
% ./src/wget http://www.osdpd.noaa.gov/COB/poltbus.asc

Couls you please give me your advice about the error message?

Thank you very much.

I.P.S.
Julian

__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com



Bug report / feature request

2003-01-28 Thread Stas Ukolov
Hi!

Wget 1.5.3 uses /robots.txt to skip some parts of web-site. But it
doesn't use  tag, which serves
to the same purpose.

I believe that Wget must also parse and use 
tags

WBR
 Stas  mailto:[EMAIL PROTECTED]




Re: bug report and patch, HTTPS recursive get

2002-05-17 Thread Kiyotaka Doumae


In message "Re: bug report and patch, HTTPS recursive get",
Ian Abbott wrote...
> Thanks again for the bug report and the proposed patch.  I thought some
> of the scheme tests in recur.c were getting messy, so propose the
> following patch that uses a function to check for similar schemes.

Thanks for your rewriting.
By your patch, the problem was solved.

Thankyou

---
Doumae Kiyotaka
Internet Initiative Japan Inc.
Technical Planning Division



Re: bug report and patch, HTTPS recursive get

2002-05-15 Thread Ian Abbott

On Wed, 15 May 2002 18:44:19 +0900, Kiyotaka Doumae <[EMAIL PROTECTED]> wrote:

>We have following HTML document.
>
>https://www.example.com/index.html
>-
>
>
>http://www.wget.org/";>Another Website
>
>
>-
>
>We run wget with -r option.
>
>> wget -r https://www.example.com/index.html
>
>wget gets http://www.wget.org/ and other url which 
>linked from http://www.wget.org/.

Thanks again for the bug report and the proposed patch.  I thought some
of the scheme tests in recur.c were getting messy, so propose the
following patch that uses a function to check for similar schemes.

The patch incorporates your bug-fix in step 7 of download_child_p() and
makes a similar change in step 4 for consistency.

src/ChangeLog entry:

2002-05-15  Ian Abbott  <[EMAIL PROTECTED]>

* url.c (schemes_are_similar_p): New function to test enumerated
scheme codes for similarity.

* url.h: Declare it.

* recur.c (download_child_p): Use it to compare schemes.  This
also fixes a bug that allows hosts to be spanned (without the
-H option) when the parent scheme is https and the child's is
http or vice versa.

Index: src/recur.c
===
RCS file: /pack/anoncvs/wget/src/recur.c,v
retrieving revision 1.48
diff -u -r1.48 recur.c
--- src/recur.c 2002/04/21 04:25:07 1.48
+++ src/recur.c 2002/05/15 13:05:35
@@ -415,6 +415,7 @@
 {
   struct url *u = upos->url;
   const char *url = u->url;
+  int u_scheme_like_http;
 
   DEBUGP (("Deciding whether to enqueue \"%s\".\n", url));
 
@@ -445,12 +446,11 @@
  More time- and memory- consuming tests should be put later on
  the list.  */
 
+  /* Determine whether URL under consideration has a HTTP-like scheme. */
+  u_scheme_like_http = schemes_are_similar_p (u->scheme, SCHEME_HTTP);
+
   /* 1. Schemes other than HTTP are normally not recursed into. */
-  if (u->scheme != SCHEME_HTTP
-#ifdef HAVE_SSL
-  && u->scheme != SCHEME_HTTPS
-#endif
-  && !(u->scheme == SCHEME_FTP && opt.follow_ftp))
+  if (!u_scheme_like_http && !(u->scheme == SCHEME_FTP && opt.follow_ftp))
 {
   DEBUGP (("Not following non-HTTP schemes.\n"));
   goto out;
@@ -458,11 +458,7 @@
 
   /* 2. If it is an absolute link and they are not followed, throw it
  out.  */
-  if (u->scheme == SCHEME_HTTP
-#ifdef HAVE_SSL
-  || u->scheme == SCHEME_HTTPS
-#endif
-  )
+  if (schemes_are_similar_p (u->scheme, SCHEME_HTTP))
 if (opt.relative_only && !upos->link_relative_p)
   {
DEBUGP (("It doesn't really look like a relative link.\n"));
@@ -483,7 +479,7 @@
  opt.no_parent.  Also ignore it for documents needed to display
  the parent page when in -p mode.  */
   if (opt.no_parent
-  && u->scheme == start_url_parsed->scheme
+  && schemes_are_similar_p (u->scheme, start_url_parsed->scheme)
   && 0 == strcasecmp (u->host, start_url_parsed->host)
   && u->port == start_url_parsed->port
   && !(opt.page_requisites && upos->link_inline_p))
@@ -526,7 +522,7 @@
 }
 
   /* 7. */
-  if (u->scheme == parent->scheme)
+  if (schemes_are_similar_p (u->scheme, parent->scheme))
 if (!opt.spanhost && 0 != strcasecmp (parent->host, u->host))
   {
DEBUGP (("This is not the same hostname as the parent's (%s and %s).\n",
@@ -535,13 +531,7 @@
   }
 
   /* 8. */
-  if (opt.use_robots
-  && (u->scheme == SCHEME_HTTP
-#ifdef HAVE_SSL
- || u->scheme == SCHEME_HTTPS
-#endif
- )
-  )
+  if (opt.use_robots && u_scheme_like_http)
 {
   struct robot_specs *specs = res_get_specs (u->host, u->port);
   if (!specs)
Index: src/url.c
===
RCS file: /pack/anoncvs/wget/src/url.c,v
retrieving revision 1.74
diff -u -r1.74 url.c
--- src/url.c   2002/04/13 03:04:47 1.74
+++ src/url.c   2002/05/15 13:05:36
@@ -2472,6 +2472,24 @@
   downloaded_files_hash = NULL;
 }
 }
+
+/* Return non-zero if scheme a is similar to scheme b.
+ 
+   Schemes are similar if they are equal.  If SSL is supported, schemes
+   are also similar if one is http (SCHEME_HTTP) and the other is https
+   (SCHEME_HTTPS).  */
+int
+schemes_are_similar_p (enum url_scheme a, enum url_scheme b)
+{
+  if (a == b)
+return 1;
+#ifdef HAVE_SSL
+  if ((a == SCHEME_HTTP && b == SCHEME_HTTPS)
+  || (a == SCHEME_HTTPS && b == SCHEME_HTTP))
+return 1;
+#endif
+  return 0;
+}
 
 #if 0
 /* Debugging and testing support for path_simplify. */
Index: src/url.h
===
RCS file: /pack/anoncvs/wget/src/url.h,v
retrieving revision 1.23
diff -u -r1.23 url.h
--- src/url.h   2002/04/13 03:04:47 1.23
+++ src/url.h   2002/05/15 13:05:36
@@ -158,4 +158,6 @@
 
 char *rewrite_shorthand_url PARAMS ((const char *));
 
+int schemes_are_similar_p PARAMS ((enum url_scheme a, enum url_scheme b));
+
 #endif /* URL_H */





Re: bug report and patch, HTTPS recursive get

2002-05-15 Thread Ian Abbott

On Wed, 15 May 2002 18:44:19 +0900, Kiyotaka Doumae <[EMAIL PROTECTED]>
wrote:

>I found a bug of wget with HTTPS resursive get, and proposal
>a patch.

Thanks for the bug report and the proposed patch.  The current scheme
comparison checks are getting messy, so I'll write a function to check
schemes for similarity (when I can spare the time later today).



Re: Bug report

2002-05-04 Thread Ian Abbott

On Fri, 3 May 2002 18:37:22 +0200, Emmanuel Jeandel
<[EMAIL PROTECTED]> wrote:

>ejeandel@yoknapatawpha:~$ wget -r a:b
>Segmentation fault

Patient: Doctor, it hurts when I do this
Doctor: Well don't do that then!

Seriously, this is already fixed in CVS.



Bug report

2002-05-03 Thread Emmanuel Jeandel

ejeandel@yoknapatawpha:~$ wget -r a:b
Segmentation fault
ejeandel@yoknapatawpha:~$ 

I encounter this bug while i wanted to do wget ftp://a:b@c/, forgetting the
ftp://
The bug is not present when -r is not there (a:b: Unsupported scheme)

Emmanuel



Re: Bug report for wget 1.8.1 / MacOSX : french translation

2002-04-11 Thread Hrvoje Niksic

Pascal Vuylsteker <[EMAIL PROTECTED]> writes:

> I've downloaded wget from http://macosx.forked.net/ as a port to
> MacOSX (package).

I'm not sure how internationalization works on MacOS X.  Perhaps you
should ask the people who did the porting?

If you want Wget to print English (original) messages, unset the LANG
environment variable.



GNU wget 1.8.1 - Bug report memory occupied

2002-03-26 Thread Dipl. Ing. Hermann Rugen





-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hallo specialists,
I used wget 1.8.1 on my system to mirror the site www.europa.eu.int.
Transfer was throug a proxy and DSL over night.
After about 12-13 hours I found following situation:
Totally download about 1.8GB data.
wget process was increased to approx 75MB RAM occupying!

This increasing was fatal for the system, because there where only
32MBRAM in the intel486-maschine.
Downlaod rate was dramatically reduced, but system was still running.
Did kill the process with Ctrl-C.
Everything seems to be o.k.
After research the data I found, that redirecting was not good in all
ways.
Files where set in the right directories, but relinking in the page
was often wrong.
should be: http://myroot/europa.eu.int/indidivdual_dir

but was:
http://europa.eu.int/individual_dir

problem seems to be, that wget misses the part of directory, that is
leading to the downloaded one.

My calling conditions for wget where:

wget -m http://europa.eu.int/

All parameters have been in downloaded status and where unchanged
excep the adres for the proxy, I had to use.
compiling was with standard features under linux 2.4.13 kernel.

Quesction:
Did I make a configuration mistake?
If not can cou correct the relinking?
Howa to make, that wget will not use so many RAM?
Do I have the chance to correct the wrong 'links'. (Not by hand,
there are thousands)

Mit freundlichem Gruß

Dipl. Ing. Hermann Rugen

Rugen Consulting
Max-Planck-Straße 7
49767 Twist

Tel.: 05931 4099 151
Fax: 05931 4099 152

eMail: [EMAIL PROTECTED]
Internet: www.rugen-consulting.com


-BEGIN PGP SIGNATURE-
Version: PGPfreeware 7.0.3 for non-commercial use 

iQA/AwUBPKBkl0Y5W7VNHjVzEQIPHQCg0xNHFV2Qrf5as2+xwvlK4Uf5Gr0AoMtY
RENbT04glmugzL3kiWOh/wG3
=i623
-END PGP SIGNATURE-






PGPexch.rtf.asc
Description: Binary data


Bug report for wget 1.8.1 / MacOSX : french translation

2002-03-21 Thread Pascal Vuylsteker

Hi,

I've downloaded wget from http://macosx.forked.net/ as a port to MacOSX 
(package).
It installed fine and even realized that my native language was french 
but had some issue with the printing of the help in french : the 
accentuated char are replaced by a ?

I would even prefer to have access to the english version (how is it 
possible ?)

Sincerly, Pascal Vuylsteker

-

localhost.13:42.~ pvk > wget --help
GNU Wget 1.8.1, un r?p?teur r?au non int?ctif.
Usage: wget [OPTION]... [URL]...

Les arguments obligatoires pour les options de formes longues le sont
aussi pour les options de formes courtes.

D?rrage:
   -V,  --version   afficher le nom et la version du logiciel
   -h,  --help  afficher l'aide-m?ire
   -b,  --backgroundtravailler ?'arri? plan apr?le d?rrage.
   -e,  --execute=COMMAND   ex?ter une commande `.wgetrc'-style.

Journalisation et fichier d'entr?
   -o,  --output-file=FICHIER   journaliser les messages dans le FICHIER.
   -a,  --append-output=FICHIER concat?r les messages au FICHIER.
   -d,  --debug afficher les informations de mise au 
point.
   -q,  --quiet travailler silencieusement (sans sortie).
   -v,  --verbose   travailler en mode bavard (par d?ut).
   -nv, --non-verbose   ne pas travailler en mode explicatif,
 mais garder un niveau informatif 
suffisant.
   -i,  --input-file=FICHIERlire les URL du FICHIER.
   -F,  --force-htmltraiter le fichier d'entr?comme du code 
HTML.
   -B,  --base=URL  ajouter le URL aux liens relatifs de -F 
-i fichier.
--sslcertfile=FICHIER  certificat optionnel du client.
--sslcertkey=FICHIER-CL?
FICHIER-CL? contenant les cl?du 
certificat.
--egd-file=FICHIER  socket vers le d?n egd (source de donn? 
al?oires).

T?chargement:
--bind-address=ADRESSElier l'ADRESSE (nom de l'h??ou IP) 
?'h??local
   -t,  --tries=NOMBREinitialiser le NOMBRE d'essais (0 sans 
limite).
   -O   --output-document=FICHIER ?ire les documents dans le FICHIER.
   -nc, --no-clobber  ne ?aser les fichiers existants.
   -c,  --continuered?rrer la r?p?tion d'un fichier 
existant.
--progress=STYLE  utiliser le STYLE de jauge de 
progresssion.
   -N,  --timestampingne pas r?p?r un fichier plus vieux 
qu'un fichier local.
   -S,  --server-response afficher la r?nse du serveur.
--spider  ne pas t?charger n'importe quoi.
   -T,  --timeout=SECONDESinitialiser le d?i de gr? en SECONDES.
   -w,  --wait=N  attendre N secondes entre chaque essai.
   -Y,  --proxy=on/offactiver (`on') ou d?ctiver (`off') le 
proxy.
   -Q,  --quota=N initialiser le quota de r?p?tion ?.

R?rtoires:
   -nd  --no-directoriesna pas cr? les r?rtoires.
   -x,  --force-directories forcer la cr?ion des r?rtoires.
   -nH, --no-host-directories   ne pas cr? les r?rtoires de l'h??
   -P,  --directory-prefix=PR?IXE  sauvegarder les fichiers avec le 
PR?IXE/...
--cut-dirs=Nignorer les N composants des 
r?rtoires de l'h??

Options HTTP:
--http-user=USAGER  utiliser le nom de l'USAGER http.
--http-passwd=MOT_DE_PASSE
utiliser le MOT_DE_PASSE http.
   -C,  --cache=on/off  activer (`on') ou d?ctiver (`off') la 
cache
de donn? du serveur (activ?ar d?ut)
   -E,  --html-extensionsauvegarder tous les documents texte/html 
avec un suffixe .html
--ignore-length ignorer le champ `Content-Length' de 
l'en-t?.
--header=CHA?E ins?r la CHA?E ?ravers les en-t?s.
--proxy-user=USAGER utiliser le nom de l'USAGER pour le proxy.
--proxy-passwd=MOT_DE_PASSE
utiliser le MOT_DE_PASSE pour le proxy.
   -s,  --save-headers  sauvegarder les en-t?s HTTP dans le 
fichier.
   -U,  --user-agent=AGENT  identifier l'AGENT plut??ue Wget/VERSION.
--no-http-keep-alived?ctivier l'option HTTP keep-alive 
(connexions persistentes).
--cookies=off   ne pas utiliser les cookies.
--load-cookies=FICHIER  charger les cookies ?artir du FICHIER 
avant la session.
--save-cookies=FICHIER  sauvegarder les cookies dans le FICHIER 
apr?la session.

Option FTP:
   -nr, --dont-remove-listing   ne pas d?uire les fichier `.listing'
   -g,  --glob=on/off   ?aser (`on') ou ne pas ?aser (`off') les 
noms de fichiers
--passive-ftp   utiliser un mode de transfert "passive".
--retr-symlinks r?p?r les lien symbolique via FTP.

R?p?tion r?rsive:
   -r,  --recursive r?p?tion r?rsive -- utiliser avec 
pr?ution!.
   -l,  --level=N   fixer le nive

bug report

2002-03-20 Thread Andax



I found a serious bug in wget, all versions 
affected.
 
Description: It is highly addictive
Solution:You should include a warning about this 
somewhere in the product :)
 
 
a windows user
 
 
 


Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-21 Thread Ian Abbott

On 21 Jan 2002 at 14:56, Thomas Lussnig wrote:

> >Why not just open the wgetrc file in text mode using
> >fopen(name, "r") instead of "rb"? Does that introduce other
> >problems?
> I think it has to do with comments because the defeinition is that 
> starting with '#'  the rest of the line
> is ignored. And an line ends with '\n' or the end of the file and not 
> with and spezial charakter '\0' that
> mean for me that to abort the reading of an textfile when zero isfound 
> mean's incorrect parsing.

(N.B. the control-Z character would be '\032', not '\0'.)

So maybe just mention in the documentation that the wgetrc file is
considered to be a plain text file, whatever that means for the
system Wget is running on. Maybe mention peculiaries of
DOS/Windows, etc.

In general, it is more portable to read or write native text files
in text mode as it performs whatever local conversions are
necessary to make reads and writes of text files appear like UNIX
i.e. each line of text terminated by a newline '\n'). In binary
mode, what you get depends on the system (Mac text files have lines
terminated by carriage return ('\r') for example, and some systems
(VMS?) don't even have line termination characters as such.)

In the case of Wget, log files are already written in text mode. I
think wgetrc needs to be read in text mode and that's an easy
change.

In the case of the --input-file option, ideally the input file
should be read in text mode unless the --force-html option is used,
in which case it should be read in the same mode as when parsing
other locally-stored HTML files.

Wget stores retrieved files in binary mode but the mode used when
reading those locally-stored files is less precise (not that it
makes much difference for UNIX). It uses open() (not fopen()) and
read() to read those files into memory (or uses mmap() to map them
into memory space if supported). The DOS/Windows version of open()
allows you to specify text or binary mode, defaulting to text mode,
so it looks like the Windows version of Wget saves html files in
binary mode and reads them back in in text mode! Well whatever -
the HTML parser still seems to work okay on Windows, probably
because HTML isn't that fussy about line-endings anyway!

So to support --input-file portably (not the --force-html version),
the get_urls_file() function in url.c should probably call a new
function read_file_text() (or read_text_file() instead of
read_file() as it does at the moment. For UNIX-type systems, that
could just fall back to calling read_file().

The local HTML file parsing stuff should probably be left well
alone but possibly add some #ifdef code for Windows to open the
file in binary mode, though there may be differences between
compilers for that.




Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-21 Thread Andre Majorel

On 2002-01-21 18:53 +0100, Hrvoje Niksic wrote:
> "Ian Abbott" <[EMAIL PROTECTED]> writes:
> 
> > Why not just open the wgetrc file in text mode using fopen(name,
> > "r") instead of "rb"? Does that introduce other problems?
> 
> Not that I'm aware of.  The reason we use "rb" now is the fact that we
> handle the EOL problem ourselves, and it seems "safer" to open the
> file in binary mode and get the real contents.

Back in my DOS days, my personal party line was to fopen all
text files in "r" mode, detect EOL by comparing with '\n' and
otherwise ignore anything that verifies isspace(). It took care of
the ^Z problem, and the code worked well on both DOS and Unix
without any #ifdefs.

-- 
André Majorel http://www.teaser.fr/~amajorel/>
std::disclaimer ("Not speaking for my employer");



Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-21 Thread Hrvoje Niksic

"Ian Abbott" <[EMAIL PROTECTED]> writes:

> Why not just open the wgetrc file in text mode using fopen(name,
> "r") instead of "rb"? Does that introduce other problems?

Not that I'm aware of.  The reason we use "rb" now is the fact that we
handle the EOL problem ourselves, and it seems "safer" to open the
file in binary mode and get the real contents.



Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-21 Thread Thomas Lussnig

>
>
>>>WGet returns an error message when the .wgetrc file is terminated
>>>with an MS-DOS end-of-file mark (Control-Z). MS-DOS is the
>>>command-line language for all versions of Windows, so ignoring the
>>>end-of-file mark would make sense.
>>>
>>Ouch, I never thought of that.  Wget opens files in binary mode and
>>handles the line termination manually -- but I never thought to handle
>>^Z.
>>
>
>Why not just open the wgetrc file in text mode using
>fopen(name, "r") instead of "rb"? Does that introduce other
>problems?
>
>In the Windows C compilers I've tried (Microsoft and Borland ones),
>"r" causes the file to be opened in text mode by default (there are
>ways to override that at compile time and/or run time), and this
>causes the ^Z to be treated as an EOF (there might be ways to
>override that too).
>
I think it has to do with comments because the defeinition is that 
starting with '#'  the rest of the line
is ignored. And an line ends with '\n' or the end of the file and not 
with and spezial charakter '\0' that
mean for me that to abort the reading of an textfile when zero isfound 
mean's incorrect parsing.

Cu Thomas Lußnig




smime.p7s
Description: S/MIME Cryptographic Signature


Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-21 Thread Ian Abbott

On 17 Jan 2002 at 2:15, Hrvoje Niksic wrote:

> Michael Jennings <[EMAIL PROTECTED]> writes:
> > WGet returns an error message when the .wgetrc file is terminated
> > with an MS-DOS end-of-file mark (Control-Z). MS-DOS is the
> > command-line language for all versions of Windows, so ignoring the
> > end-of-file mark would make sense.
> 
> Ouch, I never thought of that.  Wget opens files in binary mode and
> handles the line termination manually -- but I never thought to handle
> ^Z.

Why not just open the wgetrc file in text mode using
fopen(name, "r") instead of "rb"? Does that introduce other
problems?

In the Windows C compilers I've tried (Microsoft and Borland ones),
"r" causes the file to be opened in text mode by default (there are
ways to override that at compile time and/or run time), and this
causes the ^Z to be treated as an EOF (there might be ways to
override that too).



Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-18 Thread Hrvoje Niksic

Michael Jennings <[EMAIL PROTECTED]> writes:

> However, I have a comment: There is simple logic that would solve
> this problem. WGet, when it reads a line in the configuration file,
> probably now strips off trailing spaces (hex 20, decimal 32). I
> suggest that it strip off both trailing spaces and control
> characters (characters with hex values of 1F or less, decimal values
> of 31 or less). This is a simple change that would work in all
> cases.

The problem here is that I don't want Wget to randomly strip off
characters from its input.  Although the control characters are in
most cases a sign of corruption, I don't want Wget to be the judge of
that.

Wget currently has clearly defined parsing process: strip whitespaces
at the beginning and end of line, and around the `=' token.  Stripping
all the control characters would IMHO be a very random thing to do.

If I implemented the support for ^Z, I'd only strip it if it occurred
at the end of file, and that's somewhat harder.



RE: Bug report: 1) Small error 2) Improvement to Manual

2002-01-17 Thread Herold Heiko

> From: Michael Jennings [mailto:[EMAIL PROTECTED]]
> Obviously, this is completely your decision. You are right, 
> only DOS editors make the mistake. (It should be noted that 
> DOS is MS Windows only command line language. It isn't going 
> away; even Microsoft supplies command line utilities with all 
> versions of its OSs. Yes, Windows will probably eventually go 

Please note the difference: all windows versions include a command line.
However that commandline afaik is not dos - it is able to run dos
programs, either because based on dos (win 9x) or because capable of
understanding the difference between w32 commandline programs and dos
programs, and starting the neccessary dos *emulation*. But it is not
dos, and the behaviour is not like dos.
As far as I know, windows command line programs do not use ^Z as
end-of-file terminators (although some do honour it for
emulation/compatibility), only real dos programs do (anybody knows if
there is a - MS - standard for this ?). If this is true, should wget on
windows really emulate the behaviour of dos programs, of a environment
windows originally was based on but where it is *not*running*anymore*
(wget I mean) ? From a purists point of view, not. From a end-user point
of view, possibly in order to facilitate the changeover.
On the other hand, your report is the first one I ever saw, considering
Hrvoje's reaction and the lack of support in the original windows port
I'd say this is not a problem generally felt as important, so personally
I'm in favor of not cluttering up the port anymore with special
behaviour. But it is Hrvoje's decsion, as always.
If you feel it is important write a patch and submit it, shouldn't be a
major piece of work.
 
Heiko

-- 
-- PREVINET S.p.A.[EMAIL PROTECTED]
-- Via Ferretto, 1ph  x39-041-5907073
-- I-31021 Mogliano V.to (TV) fax x39-041-5907087
-- ITALY



Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-17 Thread Michael Jennings

-


Obviously, this is completely your decision. You are right, only DOS editors make the 
mistake. (It should be noted that DOS is MS Windows only command line language. It 
isn't going away; even Microsoft supplies command line utilities with all versions of 
its OSs. Yes, Windows will probably eventually go away, but not soon.)

However, I have a comment: There is simple logic that would solve this problem. WGet, 
when it reads a line in the configuration file, probably now strips off trailing 
spaces (hex 20, decimal 32). I suggest that it strip off both trailing spaces and 
control characters (characters with hex values of 1F or less, decimal values of 31 or 
less). This is a simple change that would work in all cases.

Regards,

Michael


__


Hrvoje Niksic wrote:

> Herold Heiko <[EMAIL PROTECTED]> writes:
>
> > My personal idea is:
> > As a matter of fact no *windows* text editor I know of, even the
> > supplied windows ones (notepad, wordpad) AFAIK will add the ^Z at the
> > end of file.txt. Wget is a *windows* program (although running in
> > console mode), not a *Dos* program (except for the real dos port I know
> > exists but never tried out).
> >
> > So personally I'd say it would not be really neccessary adding support
> > for the ^Z, even in the win32 port;
>
> That was my line of thinking too.




Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-17 Thread Hrvoje Niksic

Herold Heiko <[EMAIL PROTECTED]> writes:

> My personal idea is:
> As a matter of fact no *windows* text editor I know of, even the
> supplied windows ones (notepad, wordpad) AFAIK will add the ^Z at the
> end of file.txt. Wget is a *windows* program (although running in
> console mode), not a *Dos* program (except for the real dos port I know
> exists but never tried out).
> 
> So personally I'd say it would not be really neccessary adding support
> for the ^Z, even in the win32 port;

That was my line of thinking too.



RE: Bug report: 1) Small error 2) Improvement to Manual

2002-01-17 Thread csaba . raduly


On 17/01/2002 07:34:05 Herold Heiko wrote:
[proper order restored]
>> -Original Message-
>> From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]]
>> Sent: Thursday, January 17, 2002 2:15 AM
>> To: Michael Jennings
>> Cc: [EMAIL PROTECTED]
>> Subject: Re: Bug report: 1) Small error 2) Improvement to Manual
>>
>>
>> Michael Jennings <[EMAIL PROTECTED]> writes:
>>
>> > 1) There is a very small bug in WGet version 1.8.1. The bug occurs
>> >when a .wgetrc file is edited using an MS-DOS text editor:
>> >
>> > WGet returns an error message when the .wgetrc file is terminated
>> > with an MS-DOS end-of-file mark (Control-Z). MS-DOS is the
>> > command-line language for all versions of Windows, so ignoring the
>> > end-of-file mark would make sense.
>>
>> Ouch, I never thought of that.  Wget opens files in binary mode and
>> handles the line termination manually -- but I never thought to handle
>> ^Z.
>>
>> As much as I'd like to be helpful, I must admit I'm loath to encumber
>> the code with support for this particular thing.  I have never seen it
>> before; is it only an artifact of DOS editors, or is it used on
>> Windows too?
>>


[snip "copy con file.txt"]
>
>However in this case (at least when I just tried) the file won't contain
>the ^Z. OTOH some DOS programs still will work on NT4, NT2k and XP, and
>could be used, and would create files ending with ^Z. But do they really
>belong here and should wget be bothered ?
>
>What we really need to know is:
>
>Is ^Z still a valid, recognized character indicating end-of-file (for
>textmode files) for command shell programs on windows NT 4/2k/Xp ?
>Somebody with access to the *windows standards* could shed more light on
>this question ?
>
>My personal idea is:
>As a matter of fact no *windows* text editor I know of, even the
>supplied windows ones (notepad, wordpad) AFAIK will add the ^Z at the
>end of file.txt. Wget is a *windows* program (although running in
>console mode), not a *Dos* program (except for the real dos port I know
>exists but never tried out).
>

I don't think there's a distinction between DOS and Windows programs
in this regard. The C runtime library is most likely to play a
significant role here. For a file fopen-ed in "rt" mode, teh RTL
would convert \r\n -> \n and silently eat the _first_ ^Z,
returning EOF at that point.

When writing, it goes the other way 'round WRT \n->\r\n.
I'm unsure about whether it writes ^Z at the end, though.

>So personally I'd say it would not be really necessary adding support
>for the ^Z, even in the win32 port; except possibly for the Dos port, if
>the porter of that beast thinks it would be useful.
>

Problem could be solved by opening .netrc in "rt"
However, the "t" is a non-standard extension.

However, this is not wget's problem IMO. Different editors may behave
differently. Example: on OS/2 (which isn't a DOS shell, but can run
DOS programs), the system editor (e.exe) *does* append a ^Z at the end
of every file it saves. People have patched the binary to remove this
feature :-) AFAIK no other OS/2 editor does this.


--
Csaba Ráduly, Software Engineer   Sophos Anti-Virus
email: [EMAIL PROTECTED]http://www.sophos.com
US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933




RE: Bug report: 1) Small error 2) Improvement to Manual

2002-01-16 Thread Herold Heiko

Unfortunately every version of W9x can (with some kind of mind - please
don't start religious wars here) be considered a shell (nice, horrible,
choose what you prefer) around some kind of dos. From Win NT 4 upwards
that isn't true anymore, but for (some) compatibilities sake there are
many parallelisms which do emulate partly the behaviour of the old dos
environment.

For example, in order to rapidly create a (small file) on nt4 I still
can

C:\tmp>copy con some.file
some garbage
^Z
1 file copiato(i).

which is just like cat >file ... ^D, the difference being con a special
not-exactly-file somewhat similar to /dev/tty on unix.

However in this case (at least when I just tried) the file won't contain
the ^Z. OTOH some dos programs still will work on NT4, NT2k and XP, and
could be used, and would create files ending with ^Z. But do they really
belong here and should wget be bothered ?

What we really need to know is:

Is ^Z still a valid, recognized character indicating end-of-file (for
textmode files) for command shell programs on windows NT 4/2k/Xp ?
Somebody with access to the *windows standards* could shed more light on
this question ?

My personal idea is:
As a matter of fact no *windows* text editor I know of, even the
supplied windows ones (notepad, wordpad) AFAIK will add the ^Z at the
end of file.txt. Wget is a *windows* program (although running in
console mode), not a *Dos* program (except for the real dos port I know
exists but never tried out).

So personally I'd say it would not be really neccessary adding support
for the ^Z, even in the win32 port; except possibly for the Dos port, if
the porter of that beast thinks it would be usefull.

Heiko

-- 
-- PREVINET S.p.A.[EMAIL PROTECTED]
-- Via Ferretto, 1ph  x39-041-5907073
-- I-31021 Mogliano V.to (TV) fax x39-041-5907087
-- ITALY

> -Original Message-
> From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, January 17, 2002 2:15 AM
> To: Michael Jennings
> Cc: [EMAIL PROTECTED]
> Subject: Re: Bug report: 1) Small error 2) Improvement to Manual
> 
> 
> Michael Jennings <[EMAIL PROTECTED]> writes:
> 
> > 1) There is a very small bug in WGet version 1.8.1. The bug occurs
> >when a .wgetrc file is edited using an MS-DOS text editor:
> > 
> > WGet returns an error message when the .wgetrc file is terminated
> > with an MS-DOS end-of-file mark (Control-Z). MS-DOS is the
> > command-line language for all versions of Windows, so ignoring the
> > end-of-file mark would make sense.
> 
> Ouch, I never thought of that.  Wget opens files in binary mode and
> handles the line termination manually -- but I never thought to handle
> ^Z.
> 
> As much as I'd like to be helpful, I must admit I'm loath to encumber
> the code with support for this particular thing.  I have never seen it
> before; is it only an artifact of DOS editors, or is it used on
> Windows too?
> 



Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-16 Thread Hrvoje Niksic

Michael Jennings <[EMAIL PROTECTED]> writes:

> 1) There is a very small bug in WGet version 1.8.1. The bug occurs
>when a .wgetrc file is edited using an MS-DOS text editor:
> 
> WGet returns an error message when the .wgetrc file is terminated
> with an MS-DOS end-of-file mark (Control-Z). MS-DOS is the
> command-line language for all versions of Windows, so ignoring the
> end-of-file mark would make sense.

Ouch, I never thought of that.  Wget opens files in binary mode and
handles the line termination manually -- but I never thought to handle
^Z.

As much as I'd like to be helpful, I must admit I'm loath to encumber
the code with support for this particular thing.  I have never seen it
before; is it only an artifact of DOS editors, or is it used on
Windows too?



Bug report: 1) Small error 2) Improvement to Manual

2002-01-03 Thread Michael Jennings


-
 
WGet is wonderful.
 
1) There is a very small bug in WGet version 1.8.1. The bug occurs when
a .wgetrc file is edited using an MS-DOS text editor:
WGet returns an error message when the .wgetrc file is terminated with
an MS-DOS end-of-file mark (Control-Z). MS-DOS is the command-line language
for all versions of Windows, so ignoring the end-of-file mark would make
sense.
 
2) Suggested changes to two places in the manual are given below (the
changes are in italic). These changes will help not only people who normally
use Windows, but also people who normally use Unix, are not familiar with
Windows, and are trying to install WGet under Windows.
 
Regards,
Michael Jennings
 
_
 
 
Invoking
By default, Wget is very simple to invoke. The basic syntax is:
wget [option]... [URL]...
(Users of the Microsoft Windows version should see
the explanation under the startup file section below.)
 
_
 
 
Startup File
Once you know how to change default settings of Wget through command
line arguments, you may wish to make some of those settings permanent.
You can do that in a convenient way by creating the Wget startup file--.wgetrc.
Besides .wgetrc is the "main" initialization file, it is convenient
to have a special facility for storing passwords. Thus Wget reads and interprets
the contents of $HOME/.netrc, if it finds it. You can find .netrc format
in your system manuals.
Wget reads .wgetrc upon startup, recognizing a limited set of commands.
Microsoft Windows version: Users of the Microsoft
Windows version can call the .wgetrc file wgetrc.txt.
In the 1.8.1 version of WGet, an MS-DOS end-of-file
character at the end of the .wgetrc (or wgetrc.txt) parameters file will
cause an error message. You will have this character if you edited the
sample .wgetrc file with an MS-DOS text editor.
There are two ways to deal with this: 1) Ignore the
error. 2) Remove the end-of-file character by editing the wgetrc.txt file
with a Windows editor and deleting the rectangle representation of the
end-of-file character at the very end of the file.
Here is the contents of an example WGetB.BAT file for
use in starting WGet under Windows. This example assumes that the sample
.wgetrc file has been copied to wgetrc.txt:
set WGETRC=C:\Program Files\WGET\V181 WIN\wgetrc.txt
"C:\Program Files\WGet\v181 Win\wget.exe" -rKkpv -l1
-o wget.log %1 %2 %3 %4
The SET command sets the WGETRC environment variable
to the path and name of the .wgetrc parameters file, now called wgetrc.txt.
Environment variables accept embedded spaces; it is not necessary or possible
to use quotes. WGet will get its parameters from the wgetrc.txt file, as
well as from the command line. (The command line parameters override the
parameters in the wgetrc.txt file.)
The second line of the WGetB.BAT file starts WGet operation.
The quote marks are necessary only if you have a space in the path or file
name. Quotes can always be used even if there are no spaces.
Change the above lines so that they are suitable for
your folders and parameters. Microsoft Windows paths and file names are
not sensitive to the case of the letters.
Note that the sample batch file above is not called
wget.bat because of the possibility that the Windows operating system would
confuse the batch file and the wget.exe executable file.
Use
wgetb -l3 site.com 
to download 3 levels of the website, site.com.
 
_
 
 
 


Re: Re[4]: Bug report

2001-12-13 Thread Hrvoje Niksic

Pavel Stepchenko <[EMAIL PROTECTED]> writes:

> Ok.  Are wget can use FTP-proxy, not Http?

That's how "ftp_proxy" works -- it forwards the request to an HTTP
server that does the actual work of retrieving the document through
FTP.

That is not specific to Wget.  The browsers work the same way.



Re: Re[2]: Bug report

2001-12-13 Thread Hrvoje Niksic

Pavel Stepchenko <[EMAIL PROTECTED]> writes:

>>> Warning: wildcards not supported in HTTP.
>>> 
>>> Oooops! But this is FTP url, not HTTP!
>HN> Are you using a proxy?
> Yes.

This means that HTTP is used for retrieval, and '*' won't work --
which is what Wget is trying to warn you about.

> --17:26:58--  ftp://1.2.3.4:12345/Dir%20One/This.Is.Long.Name.Of.The.Directory/*
>=> `*'
> Connecting to 2.2.2.2:3128... connected!
> Proxy request sent, awaiting response... ^C
> 
> 1.7.1 dont say no one word about "Warning: wildcards not supported in
> HTTP."

Instead, it just silently doesn't work.



Re: Bug report

2001-12-13 Thread Hrvoje Niksic

Pavel Stepchenko <[EMAIL PROTECTED]> writes:

> Hello bug-wget,
> 
> $ wget --version
> GNU Wget 1.8
> 
> $ wget 
>ftp://password:[EMAIL PROTECTED]:12345/Dir%20One/This.Is.Long.Name.Of.The.Directory/*
> Warning: wildcards not supported in HTTP.
> 
> Oooops! But this is FTP url, not HTTP!

Are you using a proxy?



Bug report

2001-12-13 Thread Pavel Stepchenko

Hello bug-wget,

$ wget --version
GNU Wget 1.8


$ wget 
ftp://password:[EMAIL PROTECTED]:12345/Dir%20One/This.Is.Long.Name.Of.The.Directory/*
Warning: wildcards not supported in HTTP.


Oooops! But this is FTP url, not HTTP!

Please, fix it.


Thank you,

-- 
Best regards from future,
HillDale.
Pavel  mailto:[EMAIL PROTECTED]




RE: WGET 1.8 bug report

2001-12-12 Thread Herold Heiko

> From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]]
> Herold Heiko <[EMAIL PROTECTED]> writes:
> 
> > I put up the current cvs, mainly since there have been those patches
> > to ftp-ls.c and the signal handler. Ok ?
> 
> Please don't do that.  Although all changes in the current CVS
> *should* be stable, mistakes are possible.  Please provide a binary
> that is 1.8 plus the most critical patches -- currently only the
> progress.c patch.

Correct, sorry. Site updated.

> > quite a userbase which does take the zipped cvs sources I put up in
> > order to use them on unix platforms. Don't ask me why. Well,
> > possibly folks behind firewalls who can't use cvs but can download
> > with a proxy or something..
> 
> We should have daily source snapshots for such people.

I agree. With a minimum bit of logic this shouldn't even load the server
too much - before tarring check if there have been commits since last
time, shouldn't be difficult to parse a cvs history or somewhat. Or
checkout and do a find -newer. Possibly (if the general setup and sysop
permits that) the checkout files could even be directly in the sunsite
wget ftp directory for easy access to changelogs or single files.

Heiko

-- 
-- PREVINET S.p.A.[EMAIL PROTECTED]
-- Via Ferretto, 1ph  x39-041-5907073
-- I-31021 Mogliano V.to (TV) fax x39-041-5907087
-- ITALY



Re: WGET 1.8 bug report

2001-12-12 Thread Hrvoje Niksic

Herold Heiko <[EMAIL PROTECTED]> writes:

> I put up the current cvs, mainly since there have been those patches
> to ftp-ls.c and the signal handler. Ok ?

Please don't do that.  Although all changes in the current CVS
*should* be stable, mistakes are possible.  Please provide a binary
that is 1.8 plus the most critical patches -- currently only the
progress.c patch.

Yes, the ftp-ls and signal handler changes are good changes, but they
don't fix any critical or even immediately obvious problems.  They
belong in the next release, but not in a patched 1.8.

> Usually I try to not put up patched binaries or sources... I have
> the nagging feeling (from some mail I get from time to time) there's
> quite a userbase which does take the zipped cvs sources I put up in
> order to use them on unix platforms. Don't ask me why. Well,
> possibly folks behind firewalls who can't use cvs but can download
> with a proxy or something..

We should have daily source snapshots for such people.



RE: WGET 1.8 bug report

2001-12-12 Thread Herold Heiko

I put up the current cvs, mainly since there have been those patches to
ftp-ls.c and the signal handler. 
Ok ?

Usually I try to not put up patched binaries or sources... I  have the
nagging feeling (from some mail I get from time to time) there's quite a
userbase which does take the zipped cvs sources I put up in order to use
them on unix platforms. Don't ask me why. Well, possibly folks behind
firewalls who can't use cvs but can download with a proxy or something..

Heiko

-- 
-- PREVINET S.p.A.[EMAIL PROTECTED]
-- Via Ferretto, 1ph  x39-041-5907073
-- I-31021 Mogliano V.to (TV) fax x39-041-5907087
-- ITALY

> -Original Message-
> From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, December 12, 2001 12:07 PM
> To: Wget List; 
> Subject: Re: WGET 1.8 bug report
> 
> 
> " " <[EMAIL PROTECTED]> writes:
> 
> > WGET 1.8 crashes when trying to retrieve a large file (250MB)
> > It seems to happen when new progress indicator is activated 
> (bar style).
> > When switching to dot-style indicator it works (for now).
> > With smaller files there is no problem also.
> > See attachment (screenshot).
> > 
> > I use WGET 1.8b compiled for Windows 
> > downloaded from http://space.tin.it/computer/hherold/
> 
> Thanks for the report; this is (by now) a known bug in the 1.8
> release.  If you can compile from source, apply the patch provided
> below and recompile.
> 
> Herold, could you please recompile the 1.8 binaries with this patch
> included?  Thanks!
> 
> Index: src/progress.c
> ===
> RCS file: /pack/anoncvs/wget/src/progress.c,v
> retrieving revision 1.21
> retrieving revision 1.22
> diff -u -r1.21 -r1.22
> --- src/progress.c2001/12/09 01:24:40 1.21
> +++ src/progress.c2001/12/09 04:51:40 1.22
> @@ -647,7 +647,7 @@
>   /* Hours not printed: pad with three spaces (two digits and
>  colon). */
>   APPEND_LITERAL ("   ");
> -  else if (eta_hrs >= 10)
> +  else if (eta_hrs < 10)
>   /* Hours printed with one digit: pad with one space. */
>   *p++ = ' ';
>else
> 



Re: WGET 1.8 bug report

2001-12-12 Thread Hrvoje Niksic

" " <[EMAIL PROTECTED]> writes:

> WGET 1.8 crashes when trying to retrieve a large file (250MB)
> It seems to happen when new progress indicator is activated (bar style).
> When switching to dot-style indicator it works (for now).
> With smaller files there is no problem also.
> See attachment (screenshot).
> 
> I use WGET 1.8b compiled for Windows 
> downloaded from http://space.tin.it/computer/hherold/

Thanks for the report; this is (by now) a known bug in the 1.8
release.  If you can compile from source, apply the patch provided
below and recompile.

Herold, could you please recompile the 1.8 binaries with this patch
included?  Thanks!

Index: src/progress.c
===
RCS file: /pack/anoncvs/wget/src/progress.c,v
retrieving revision 1.21
retrieving revision 1.22
diff -u -r1.21 -r1.22
--- src/progress.c  2001/12/09 01:24:40 1.21
+++ src/progress.c  2001/12/09 04:51:40 1.22
@@ -647,7 +647,7 @@
/* Hours not printed: pad with three spaces (two digits and
   colon). */
APPEND_LITERAL ("   ");
-  else if (eta_hrs >= 10)
+  else if (eta_hrs < 10)
/* Hours printed with one digit: pad with one space. */
*p++ = ' ';
   else



WGET 1.8 bug report

2001-12-12 Thread

WGET 1.8 crashes when trying to retrieve a large file (250MB)
It seems to happen when new progress indicator is activated (bar style).
When switching to dot-style indicator it works (for now).
With smaller files there is no problem also.
See attachment (screenshot).

I use WGET 1.8b compiled for Windows 
downloaded from http://space.tin.it/computer/hherold/




WGET screenshot.tif
Description: Binary data