Re: A tricky download

2001-10-12 Thread Edward J. Sabol

> many thanks for your wget. I use a win32 port of the 1.7 version 
> (gnuwin32.sf.net/wget). I have tried to download a German version of the 
> Bible with wget -m http://biblewerk.de/bible and I got only the first 
> chapters of all the books, but to access to other chapters a javascript 
> command [go(1)] is used (and other ways with javascript).
>
> Is there no way to download the whole site? (My only scope is the private use).

Probably not. If the only links to the other chapters are in JavaScript
commands, then there's no way wget can do it. Wget does not interpret
JavaScript and most likely never will.



Re: make parse error

2001-10-02 Thread Edward J. Sabol

> While trying to "make" on the Mac OS X 10.1 platform, i am getting
> html-parse errors (see below):

Gee, where have I heard that before? :-)

It's a bug in cpp-precomp, Apple's C pre-processor that implements support
for pre-compiled headers. The way to avoid the error is to type the following
before executing configure:

setenv CPPFLAGS "-no-cpp-precomp"

(That's tcsh shell syntax. Adjust accordingly if you're using sh/zsh.)

You pretty much want to do that with almost everything you compile on Mac OS
X, by the way. Cpp-precomp is really only useful when compiling the Darwin
kernel or other extremely large project.



Re: Anyone besides Hrovje have write access to the CVS archive?

2001-09-27 Thread Edward J. Sabol

On Mon, 24 Sep 2001, Edward J. Sabol wrote:
>> We've started to accumulate a fair number of patches which fix serious
>> problems in wget 1.7. It would be really nice to apply to them to the CVS
>> archive so that they don't get lost.

Daniel Stenberg replied:
> Based on previous mails on this list, I've always thought that Jan
> Prikryl and Dan Harkless had/have CVS write access. But I might be
> totally wrong.

Unfortunately, Dan Harkless left the wget development arena last December,
IIRC, when he changed jobs or something like that.

Jan, are you willing and/or able to commit other patches?



Anyone besides Hrovje have write access to the CVS archive?

2001-09-24 Thread Edward J. Sabol

We've started to accumulate a fair number of patches which fix serious
problems in wget 1.7. It would be really nice to apply to them to the CVS
archive so that they don't get lost. Has anyone been in contact with Hrovje
lately?




Re: wget & IPv6

2001-09-19 Thread Edward J. Sabol

Excerpts from mail: (18-Sep-01) wget & IPv6 by Joseph Townsend
> I was looking at the code for wget and noticed that it does not
> appear to support IPv6. Are there any plans to do this in a later
> release? Please reply directly to me since I am not a member of this
> mailing list.

At *least* two patches have been posted to either the wget mailing list or
the wget-patches mailing list within the past nine months. These patches have
not yet been included in the wget distribution. I suggest you check the
mailing list archives if you're interested in testing them.



Re: convert ?& in links?

2001-09-07 Thread Edward J. Sabol

> I'd like to use wget to create a static version of my 
> dynamic site... 
>
> wget -m works great, except that it creates files like
> this:
>
> bestsavers.jsp?id=4&cat=7
>
> Can I tell wget to convert those links.. replacing all
> ? and & with _ ?

You'll have to modify the source code to wget, I think. Fortunately, this is
Free Software, so you can! Load up src/url.c into an editor and look for the
following code snippet:

#ifdef WINDOWS
  {
char *p = file;
for (p = file; *p; p++)
  if (*p == '%')
*p = '@';
  }
#endif /* WINDOWS */

First, unless you're actually running on Windows, get rid of the lines that
start with "#ifdef" and "#endif".

Next, change the lines

  if (*p == '%')
*p = '@';

to

  if (*p == '?' || *p == '&')
*p = '_';

This is untested, but I think it should work. I hope it does!

Hope this helps,
Ed



Re: wget/html-parse.c

2001-09-06 Thread Edward J. Sabol

Denis Ahrens wrote:
> In line 435 in html-parse.c is a non-escaped doubleqoute (").

That's perfectly valid code.

> I cannot compile this file under MacOSX without escaping this char.

That's a bug in cpp-precomp, Apple's C pre-processor that implements support
for pre-compiled headers. The way to avoid the error is to type the following
(tcsh shell semantics) before executing ./configure:

setenv CPPFLAGS "-no-cpp-precomp"

You pretty much want to do that with almost everything you compile on Mac OS
X, by the way. Cpp-precomp is really only useful when compiling the Darwin
kernel.

Hope this helps,
Ed



Re: wget1.7: Compilation Error (please Cc'ed to me :-)

2001-08-31 Thread Edward J. Sabol

[ Oops. Ignore my previous e-mail. I accidentally hit the send key before I
  was finished composing... Sorry. ]

Zefiro encountered the following compilation error:
>> utils.c: In function `read_file':
>> utils.c:980: `MAP_FAILED' undeclared (first use this function)

Ian Abott suggested:
> Try this patch:

Which is exactly what's in sysdep.h, which is included by util.c via wget.h.
Why doesn't that work?



Re: wget1.7: Compilation Error (please Cc'ed to me :-)

2001-08-31 Thread Edward J. Sabol

Zefiro encountered the following compilation error:
>> utils.c: In function `read_file':
>> utils.c:980: `MAP_FAILED' undeclared (first use this function)

Ian Abott suggested:
> Try this patch:


> Press C-c C-c here to receive file transmission
 -=- MIME -=- 

--Message-Boundary-14934
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Content-description: Mail message body

On 31 Aug 2001, at 6:45, zefiro wrote:

> on my SUN-SPARC-SUNOS4.1.4, using GCC-2.8.0, after running the
> "configure" script (without any argument), I've run the "make"
> command (without any argument).
> It has been gone out with the following compilation Error:
> 
> gcc -I. -I.   -DHAVE_CONFIG_H
> -DSYSTEM_WGETRC=\"/usr/local/etc/wgetrc\"
> -DLOCALEDIR=\"/usr/local/share/locale\" -g -O2 -c utils.c
> utils.c: In function `read_file':
> utils.c:980: `MAP_FAILED' undeclared (first use this function)
> utils.c:980: (Each undeclared identifier is reported only once
> utils.c:980: for each function it appears in.)
> *** Error code 1

Try this patch:



--Message-Boundary-14934
Content-type: Application/Octet-stream; name="wget-1.7-MAP_FAILED.patch"; type=Text
Content-disposition: attachment; filename="wget-1.7-MAP_FAILED.patch"

--- src/utils.c.origSun May 27 20:35:12 2001
+++ src/utils.c Fri Aug 31 17:07:40 2001
@@ -33,6 +33,9 @@
 #endif
 #ifdef HAVE_MMAP
 # include 
+# ifndef MAP_FAILED
+#  define MAP_FAILED ((void *)-1)
+# endif
 #endif
 #ifdef HAVE_PWD_H
 # include 

--Message-Boundary-14934--




Re: Wget win32, http-post, and www-form-urlencoded ...

2001-08-27 Thread Edward J. Sabol

> Is http-post support currently available??

Not in any official version of wget. I hope it will someday support this. If
you search the wget mailing list archives, you'll find a couple source code
patches which have implemented the POST capability in wget, but they have not
been accepted for whatever reason by Hrovje, the wget source maintainer and
lead developer.

Your other option would be to find a different program which has this
capability. I believe "curl" does. Check out  for more
information. I have no idea if Win32 versions of curl exist or not.

Hope this helps,
Ed




Re: spanhost and recursive.

2001-08-24 Thread Edward J. Sabol

Anders Rosendal asked:
> Could you make an option to only fetch from other hosts what is directly
> referenced from the orig page?

Have you tried the "--page-requisites" (a.k.a. "-p") command line option?

The info documentation says this:

 Actually, to download a single page and all its requisites (even
 if they exist on separate websites), and make sure the lot
 displays properly locally, this author likes to use a few options
 in addition to `-p':

  wget -E -H -k -K -nh -p http://SITE/DOCUMENT

 In one case you'll need to add a couple more options.  If DOCUMENT
 is a `' page, the "one more hop" that `-p' gives you
 won't be enough--you'll get the `' pages that are
 referenced, but you won't get _their_ requisites.  Therefore, in
 this case you'll need to add `-r -l1' to the commandline.  The `-r
 -l1' will recurse from the `' page to to the `'
 pages, and the `-p' will get their requisites.  If you're already
 using a recursion level of 1 or more, you'll need to up it by one.
 In the future, `-p' may be made smarter so that it'll do "two
 more hops" in the case of a `' page.

 To finish off this topic, it's worth knowing that Wget's idea of an
 external document link is any URL specified in an `' tag, an
 `' tag, or a `' tag other than `'.



Re: "wget -k" crashes when converting a specific url

2001-08-24 Thread Edward J. Sabol

Ian Abbott posted:
> Thanks for tracking that down. I've now found the problem, fixed it and 
> created a patch (attached) against the current CVS sources.

Argh, you beat me to it. Good job.

> At least that extra code was a convenient place for me to stick a 
> brreakpoint on in gdb, and also helped me verify that I've nailed the 
> bug (I checked the converted html file too, of course!).

We might want to consider putting an assert() in convert_links()...

> It's a shame Hrvoje Niksik's not arround at the moment to apply all 
> these patches to the repository.

I think Jan Prikyl, our resident wget FTP source code expert, also has CVS
commit access. Maybe he could start committing some of the patches in the
backlog? Or you could try contacting Hrovje...

Later,
Ed



Re: "wget -k" crashes when converting a specific url

2001-08-23 Thread Edward J. Sabol

Nathan J. Yoder wrote:
>> Please fix this soon,
>> 
>> ***COMMAND***
>> wget -k http://reality.sgi.com/fxgovers_houst/yama/panels/panelsIntro.html
>[snip]
>> 02:30:05 (23.54 KB/s) - `panelsIntro.html' saved [3061/3061]
>> 
>> Converting panelsIntro.html... zsh: segmentation fault (core dumped)

Ian Abbott replied:
> I cannot reproduce this failure on my RedHat 7.1 box.

I was able to reproduce this pretty easily on both Irix 6.5.2 and Digital
Unix 4.0d, using gcc 2.95.2. (I bet Linux's glibc has code to protect against
fwrite() calls with negative lengths.)

The problem occurs when you have a single tag with multiple attributes that
specify links that need to be converted. In this case, it's an IMG tag with
SRC and LOWSRC attributes. The urlpos structure passed to convert_links() is
a linked list of pointers to where the links are that needed to be converted.
The problem is that the links are not in positional order. The second
attribute is in the linked list before the first attribute, causing the
length of the string to be printed out to be a negative number.

Here's a diff (against the current CVS sources) which will prevent the core
dump, but please note that it does not fix the problem. html-parse.c and
html-url.c are some dense code, and I'm still wading through it. (Also, it's
not clear if the linked list is supposed to be in positional order or if
convert_links() is wrongly assuming that.)

Index: url.c
===
RCS file: /pack/anoncvs/wget/src/url.c,v
retrieving revision 1.46
diff -u -r1.46 url.c
--- url.c   2001/06/18 09:08:04 1.46
+++ url.c   2001/08/23 17:07:10
@@ -1442,6 +1442,11 @@

   /* Echo the file contents, up to the offending URL's opening
  quote, to the outfile.  */
+  if (url_start - p < 0)
+   {
+ DEBUGP (("URLs are out of order!  Please investigate."));
+ break;
+   }
   fwrite (p, 1, url_start - p, fp);
   p = url_start;
   if (l->convert == CO_CONVERT_TO_RELATIVE)



Re: WGET & POST

2001-08-21 Thread Edward J. Sabol

> is there a way to use HTTP POST with wget ??

Unfortunately, no. Wget doesn't support POSTs yet (only GETs). It will
someday, I hope. If you search the wget mailing list archive, you'll find a
couple source code patches which have implemented the POST capability in
wget, but they have not been accepted for whatever reason by Hrovje, the wget
source maintainer and lead developer. If you absolutely need POST
functionality in wget right away, I could forward some of these source code
patches to you. I have no idea if they work or not, and I cannot assist you
with them. You're welcome to try, however. Please e-mail me directly (not the
mailing list) and I will forward them to you.

Your other option would be to find another program which has this capability.
I believe "curl" does. Check out  for more information.

Hope this helps,
Ed



Re: Help compiling Wget on MachTen

2001-08-20 Thread Edward J. Sabol

> Compiling wget on a Mac G3, OS 8.6, MachTen 4.1.1.
>
> After converting the un-tarred files from Mac to UNIX, and running
> ./configure, typing make yields the following:

Did you use Stuffit Expander or some other Mac program to do your un-tar-ing?
If so, that's probably the root of your problem. I suggest you download the
tar file again and un-tar it using the MachTen tar command (I presume it has
one; if it doesn't download GNU tar) using the MachTen command line.

You'd probably get better help if you asked on the MachTen UseNet group or
some MachTen web forum...



Re: Range Request

2001-08-08 Thread Edward J. Sabol

Excerpts from mail: (07-Aug-01) Range Request by Stefan Saroiu
> I need to retrieve certain ranges of documents. For this, I'm using the
> following wget flags:
>
> wget --header='Range: bytes=100-' ip_address
>
> The problem is that wget mistakes this command with a partial download and
> start retrying it. After 20 times, it times out.
>
> I think I can fix this problem myself, by adding a new flag to wget, to
> ignore the fact that the document is partial. I'm willing to write the
> code myself, and send in a patch.

Instead of a flag to ignore the fact that the document is partial, I would
suggest instead a command line like this:

wget --range=100- URL

Just my two cents...



Re: Compiling with ssl

2001-08-01 Thread Edward J. Sabol

Sue Thielen wrote:
> I need wget to run with the ssl support.. and I can't get it to compile in.
> I've seen that some other people have had this problem, but I haven't found
> a solution. Is there a solution??

Two suggestions:

1. Try version 1.7.1-pre1 instead.
ftp://gnjilux.srk.fer.hr/pub/unix/util/wget/.betas/wget-1.7.1-pre1.tar.gz

2. Try the CVS version. Consult  for
more info on how to access the wget CVS archive.




Re: minor problem compiling wget-1.7 on Mac OS X / Darwin

2001-07-13 Thread Edward J. Sabol

Eugene Lee wrote:
> For some reason, Apple's GCC does not like the '"' in its assert().

The problem is not with Apple's gcc, nor with the Darwin definition of the
assert() macro, it's with cpp-precomp, Apple's C pre-processor that
implements support for pre-compiled headers.

> There are two different ways to fix this:
>
> 1) change '"' to '\"', which is also legal ANSI C.
>
> 2) CPPFLAGS="-traditional-cpp" ./configure

FYI, the preferred flag is currently "-no-cpp-precomp". Both will work
currently, but only "-no-cpp-precomp" will work correctly with gcc 3.1
whenever that gets released.



Re: Missing escape character

2001-06-27 Thread Edward J. Sabol

Ian> Also, both versions of the character constant '"' and '\"' are valid,
Ian> so if the compiler barfs on any of the above it must be faulty. I
Ian> suggest a bug report to the maintainers of this compiler is in order.

The problems not with the compiler. Indeed, the Mac OS X compiler is gcc. The
problem, in this case, is almost assuredly with the Mac OS X C pre-processor,
cpp-precomp. Cpp-precomp is a little weird, but it handles pre-compiled
headers which is a big win when you're doing daily kernal compilations, so
the Apple folks have learned to live with it. Bill, I bet if you compile wget
using the "-traditional-cpp" CFLAG (which causes gcc to use GNU cpp instead of
cpp-precomp), it will compile just fine.



Re: Testing wget with ssl?

2001-06-20 Thread Edward J. Sabol

Marc Stephenson asked:
> I've built wget 1.7.1 with ssl, but don't really know how to test it.
> Anybody know an easy way to test that combination?

Personally, I just tried to connect to https://www.apache-ssl.org/. I picked
that site because you don't need (as far as I know) a userid or password or
certificate to connect to it. But I'm by no means an expert here...

Anyway, the current SSL implementation will only work on systems with a
/dev/random. Additional code is needed to gather entropy from other sources.
(You'd think OpenSSL would be sophisticated enough to do that automatically,
but it's not.)

Later,
Ed



1.7.1-pre1 on IRIX 6.5.2: success

2001-06-15 Thread Edward J. Sabol

I didn't have any problems with 1.7 (except that SSL wouldn't work due to
lack of /dev/random), but, for the record, 1.7.1-pre1 compiles fine and seems
to work fine as well.

By the way, you can add mips-sgi-irix6.5 to the MACHINES file.



Re: wget option for specifying http method "get" or "post" to use - is there one?...

2001-06-12 Thread Edward J. Sabol

Dominic Caffey asked:
> Is there a wget option to specify which http method "get" or "post" that
> wget should use?

Wget doesn't support POSTs yet (only GETs). It will soon, I hope. If you
search the wget mailing list archive, you'll find a couple patches which
implemented POSTs, but they have not been accepted for one reason or another.
If you absolutely need POST functionality right now, however, you could use
one of these source code patches.

> BTW, I'm getting my info from the version 1.5.3 docs.

FYI, the current version of wget is 1.7. You might want to upgrade if your
executable is as out of date as your documentation.



How do I get SSL support to work in 1.7?

2001-06-06 Thread Edward J. Sabol

H. I've tried connecting to various sites using https without success.
I've tried this on both IRIX 6.5.2 and Digital Unix 4.0d.

When I installed OpenSSL 0.9.6a using the default configure options, it
didn't make any shared libraries, but I have libssl.a and libcrypto.a
installed, and wget's configure process does find them. (Do I need to install
the shared libraries?)

For example, I can connect to https://www.apache-ssl.org/ in Netscape just
fine, but here's what happens when I try with wget 1.7:

% ./src/wget --debug https://www.apache-ssl.org/
DEBUG output created by Wget 1.7 on osf4.0d.

parseurl ("https://www.apache-ssl.org/";) -> host www.apache-ssl.org -> opath  -> dir  
-> file  -> ndir 
newpath: /
--14:01:54--  https://www.apache-ssl.org/
   => `index.html'
Connecting to www.apache-ssl.org:443... Caching www.apache-ssl.org <-> 193.123.86.250
Created fd 4.
connected!

Unable to establish SSL connection.
Closing fd 4

Unable to establish SSL connection.



Re: whether forget to free memory?

2001-05-18 Thread Edward J. Sabol

> fanlb <[EMAIL PROTECTED]> writes:
>> Would u1 been leaved in memory,if allocation for u1 were succeeded
>> but u2 failed?
> If any memory allocation fails, Wget exits with an appropriate error
> message.

I think there's a valid point here. The memory allocation part of the
question was misleading though. There is a memory leak in url_equal() if the
second parseurl() returns a value other than URLOK. u1 is never freed in that
scenario. Here's a simplistic patch:

Index: url.c
===
RCS file: /pack/anoncvs/wget/src/url.c,v
retrieving revision 1.44
diff -u -r1.44 url.c
--- url.c   2001/04/26 10:11:49 1.44
+++ url.c   2001/05/18 20:06:52
@@ -792,6 +792,7 @@
   err = parseurl (url2, u2, 0);
   if (err != URLOK)
 {
+  freeurl (u1, 1);
   freeurl (u2, 1);
   return 0;
 }


But I guess this is a completely pedantic issue. I don't see any code in wget
that even uses url_equal(), or am I missing something?

While we're being pedantic though, it seems to me that the return values for
url_equal() were poorly chosen with both the "not equal" condition and the
error condition defined to be the same return value. But it doesn't matter
really, so there's little point in changing it.



Re: Wget timestamp problem in RH6.2

2001-05-01 Thread Edward J. Sabol

The difference in timestamps is due to a bug related to the handling of
daylight savings time. It's already been fixed in the current development
version of wget in the CVS archive. If you are interested, consult
 for more info.



Re: Bundling libtool

2001-03-27 Thread Edward J. Sabol

Drazen Kacar wrote:
> I'm concerned about something else. You're promoting libtool usage. I
> can prove that it doesn't work correctly on some platforms which it
> allegedly supports (according to libtool documentation). If many
> people say many times that libtool solves linking problems, that
> won't become true just because they are saying it so often. But the
> observable effect is that many other developers will never look into
> linking issues, just because they think that there is this magic tool
> which can do it for them.

Drazen, you've done a excellent job of explaining the problems that libtool
has on Solaris (and possibly other platforms). Personally, I would classify
most of these as bugs in libtool. One or two sound like feature requests to
me, but they do sound like reasonable requests and probably should be
implemented. In either case, IMHO, the preferred course of action would be to
get the bugs in libtool fixed instead of going around telling package
maintainers that they shouldn't use libtool. Have you contacted the libtool
maintainers and itemized the bugs you've found and your proposed solutions?
Or better yet, have you submitted patches? If you have and the libtool
maintainers are unresponsive to your concerns, then I'll agree you have a
legitimate beef here. Personally, I think using libtool is a big win for
package maintainers, who shouldn't have to keep re-inventing the same wheel
over and over again. That assumes any libtool deficiencies you've cited can
be fixed, of course.

Just my two cents,
Ed



Re: wget_new_percentage

2001-02-15 Thread Edward J. Sabol

Hrovje asked:
> "Dan Harkless" <[EMAIL PROTECTED]> writes:
>> I think a bigger problem is how the current display can go over 80
>> columns,
>
> In which case does it go over 80?

I'm not sure, but there appears to be a problem whenever you resume a partially
transferred file. Note the large negative number for the rate. This appears
to be a bug to me.

>  [ skipping 250K ]
> 250K ,, .. .. .. .. 28% @-305780.35 B/s

For what it's worth, I vote in favor of adopting Vladi's format. The ETA is
very useful. Perhaps even more useful than the download rate. I also think
that it looks nicer than the 1.7 version.

As for confusing users with too many numbers, I think the easy solution is to
simply add a header line right above the numbers, like so:

   ETA   Rate
   0K .. .. .. .. .. 21%   0:00 7.31M
  50K .. .. .. .. .. 43%   0:00 8.53M

Later,
Ed