Re: How to use wget with option -p without writing files to disk?

2003-11-12 Thread Hrvoje Niksic
Jens Schleusener <[EMAIL PROTECTED]> writes:

> But that doesn't work since wget probably needs the downloaded pages
> to find the files necessary to properly display the complete HTML
> page.

Exactly.  Sorry about that; it will be fixed in a future release.

Currently the only workaround is to download to a RAM disk or tmpfs
file system, or anything that is clearly faster than your net
connection, so that the writing time does not enter into account.



How to use wget with option -p without writing files to disk?

2003-11-12 Thread Jens Schleusener
Hi,

I just want to use wget (v1.9.1-rc1) to do some simple access-time
benchmarking of some WWW pages. So I first started with

  wget --page-requisites --timeout=30 --proxy=off \
   --tries=1 \
   http://www.foo.bar/

(last output line for e.g.: Downloaded: 76,431 bytes in 27 files)

But then I remarked that in this way I also measured the disk I/O while
writing the fetched files to the local disk.

So the next idea was to let write "wget" the output to /dev/null
(option --tries=1 omitted since it's the default using --output-document)

  wget --page-requisites --timeout=30 --proxy=off \
   --output-document=/dev/null \
   http://www.foo.bar/

(last output line for e.g.: Downloaded: 31,999 bytes in 2 files)

But that doesn't work since wget probably needs the downloaded pages to
find the files necessary to properly display the complete HTML page.

A workaround seems to call wget once in the standard way so the files are
locally available but that probably wouldn't work correctly if the
benchmarked page were be changed.

Any ideas to that correctly with wget?  Or any pointers to more
appropriate tools?

Greetings

Jens

-- 
Dr. Jens SchleusenerT-Systems Solutions for Research GmbH
Tel: +49 551 709-2493   Bunsenstr.10
Fax: +49 551 709-2169   D-37073 Goettingen
[EMAIL PROTECTED]  http://www.t-systems.com/


Re: AI_ADDRCONFIG

2003-11-12 Thread Hrvoje Niksic
Mauro Tortonesi <[EMAIL PROTECTED]> writes:

> i wouldn't do i at configure time because compilation would then be
> prone to some problems which may be difficult to find out. for
> example, what if you have compiled wget without loading the ipv6
> module, but your system supports PF_INET6 sockets and you want wget
> to have ipv6 support? or what if you have compiled wget with the the
> ipv6 module loaded but normally your system has ipv6 support turned
> off?

The nice thing about Wget's --inet4-only switch and the corresponding
.wgetrc setting is that they can be reverted with --no-inet4-only.  So
in theory, the user could use --no-inet4 to undo the problem.

However, I agree that this is still suboptimal.  So let's add the
socket creation check in main().  The check will only occur on systems
*with* IPv6 in libc, but *without* AI_ADDRCONFIG.  The number of those
will dwindle as IPv6 gets more widely implemented, so even that
minimal inefficiency is not here to stay.

Interestingly enough, the glibc I installed from Rawhide (while
Rawhide still existed) does support AI_ADDRCONFIG, AI_V4MAPPED, and
AI_ALL, at least according to Wget's configure.  The version of glibc
is "2.3.2-82", but I don't know if the IPv6 stuff is native to that
version or if it was added by Red Hat's packagers.

>> > the problem with rfc3484 may arise if we don't use the sockaddr
>> > addresses returned by getaddrinfo in order, but this is another
>> > problem.
>>
>> From what I can tell, we'll always use them in order, so we should
>> be safe.
>
> yes, let's keep using this policy.

I'll explicitly document this in the docstring of lookup_host, so that
it's clear that the preserved ordering is not an artifact of the
current implementation.

NB, I believe Ari's IPv6 patch posted to wget-patches contained code
that sorted the address list.  I didn't apply that patch because the
CVS Wget already had support for dual-family systems, but it would
indicate that there is a certain temptation to reorder the results
returned by getaddrinfo, and *that* can lead to conflicts with
rfc3484.


Re: AI_ADDRCONFIG

2003-11-12 Thread Mauro Tortonesi
On Wed, 12 Nov 2003, Hrvoje Niksic wrote:

> Mauro Tortonesi <[EMAIL PROTECTED]> writes:
>
> > perhaps we can perform a check like this in main: if AI_ADDRCONFIG
> > is not supported AND ipv6 is not supported (e.g. creation of
> > PF_INET6 sockets fails or we don't have a global ipv6 address
> > configured on one of the interfaces), then enable --inet4-only.
>
> That is exactly what I was proposing (see the "Better yet..."
> sentence).
>
> Could we push that check to configure time, so that every call to
> main() doesn't needlessly create a socket?  But then the binary built
> on an IPv6-less system would have a strange default when transferred
> to a system with working IPv6.

i wouldn't do i at configure time because compilation would then be prone
to some problems which may be difficult to find out. for example, what if
you have compiled wget without loading the ipv6 module, but your system
supports PF_INET6 sockets and you want wget to have ipv6 support? or what
if you have compiled wget with the the ipv6 module loaded but normally
your system has ipv6 support turned off?


> > the problem with rfc3484 may arise if we don't use the sockaddr
> > addresses returned by getaddrinfo in order, but this is another
> > problem.
>
> From what I can tell, we'll always use them in order, so we should be
> safe.

yes, let's keep using this policy.

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
Deep Space 6 - IPv6 with Linux  http://www.deepspace6.net
Ferrara Linux User Grouphttp://www.ferrara.linux.it




Re: keep alive connections

2003-11-12 Thread Hrvoje Niksic
Alain Bench <[EMAIL PROTECTED]> writes:

> OK, wasn't aware of the spurious HEAD bodies problem. But Wget also
> closes the connection between a GET (with body) and the HEAD for the
> next file.

Could you post a URL for which this happens?  I wasn't aware of this
problem and would like to fix it.

>> But maybe it would actually be a better idea to read (and discard)
>> the body than to close the connection and reopen it.
>
> Hum... Would it be possible to close/reopen only if, and as soon as,
> first byte of spurious body comes?

This is harder than it seems.  How exactly do you propose to detect
the unwanted body?  If you wait for an arbitrary time for the body
data to start arriving, you slow down all downloads and defeat the
purpose of the persistent connections (speed).  If you don't wait, the
detection doesn't work because the body data can start arriving a bit
later (which is frequently the case with CGI's).  Either case, you
lose.

What Wget does only sacrifices persistent connections at times, but
does the right thing with all kinds of responses and doesn't introduce
artificial delays.

>>>| Keep-Alive: timeout=15, max=5
>>> Without --timestamping Wget keeps "Reusing fd 3." and closing it only
>>> once every 6 files (first + 5 more).
>> This might be due to redirections.
>
> No redirections involved: That closure is normal, due to the "max=5"
> the server responds to the first request. At second GET it's "max=4" and
> gets decremented each time. Finally at the 6th request there is no more
> "Connection:" nor "Keep-Alive:" fields.

Oh, I see, it's a server setting.  Why do they use such a limit?


Re: AI_ADDRCONFIG

2003-11-12 Thread Hrvoje Niksic
Mauro Tortonesi <[EMAIL PROTECTED]> writes:

> perhaps we can perform a check like this in main: if AI_ADDRCONFIG
> is not supported AND ipv6 is not supported (e.g. creation of
> PF_INET6 sockets fails or we don't have a global ipv6 address
> configured on one of the interfaces), then enable --inet4-only.

That is exactly what I was proposing (see the "Better yet..." 
sentence).

Could we push that check to configure time, so that every call to
main() doesn't needlessly create a socket?  But then the binary built
on an IPv6-less system would have a strange default when transferred
to a system with working IPv6.

> the problem with rfc3484 may arise if we don't use the sockaddr
> addresses returned by getaddrinfo in order, but this is another
> problem.

>From what I can tell, we'll always use them in order, so we should be
safe.


Re: keep alive connections

2003-11-12 Thread Alain Bench
 On Tuesday, November 11, 2003 at 2:41:31 PM +0100, Hrvoje Niksic wrote:

> Alain Bench <[EMAIL PROTECTED]> writes:
>> with --timestamping: Each HEAD and each possible GET uses a new
>> connection.
> I think the difference is that Wget closes the connection when it
> decides not to read the request body.

OK, wasn't aware of the spurious HEAD bodies problem. But Wget also
closes the connection between a GET (with body) and the HEAD for the
next file.


> But maybe it would actually be a better idea to read (and discard) the
> body than to close the connection and reopen it.

Hum... Would it be possible to close/reopen only if, and as soon as,
first byte of spurious body comes? I guess it could be difficult to deal
cleanly with next file in limit cases...


>>| Keep-Alive: timeout=15, max=5
>> Without --timestamping Wget keeps "Reusing fd 3." and closing it only
>> once every 6 files (first + 5 more).
> This might be due to redirections.

No redirections involved: That closure is normal, due to the "max=5"
the server responds to the first request. At second GET it's "max=4" and
gets decremented each time. Finally at the 6th request there is no more
"Connection:" nor "Keep-Alive:" fields. The /etc/apache/httpd.conf says:

| # KeepAlive: The number of Keep-Alive persistent requests to accept
| # per connection. Set to 0 to deactivate Keep-Alive support
| KeepAlive 5
|
| # KeepAliveTimeout: Number of seconds to wait for the next request
| KeepAliveTimeout 15


Bye!Alain.
-- 
When you want to reply to a mailing list, please avoid doing so from a
digest. This often builds incorrect references and breaks threads.


Re: AI_ADDRCONFIG

2003-11-12 Thread Mauro Tortonesi
On Wed, 12 Nov 2003, Hrvoje Niksic wrote:

> Mauro Tortonesi <[EMAIL PROTECTED]> writes:
>
> >> I suppose I can work around the problem by specifying `inet4_only=yes'
> >> in .wgetrc...
> >>
> >> Better yet, maybe we should make -4 the default on machines that don't
> >> support AI_ADDRCONFIG and on which creating an AF_INET6 socket fails?
> >
> > IMHO, no. we should simply try in order each sockaddr address
> > returned by getaddrinfo (if we don't, there can be problems in
> > system which support RFC3484) and print an error message only if the
> > verbosity option is turned on.
>
> Yes, but Wget is not nc -- the verbosity option is on by default.  :-)
>
> Also, the failed connect attempts potentially slow things down.  I
> don't want Wget to try to connect to random IPv6 addresses -- it will
> not work for me and it's just wrong.  Wget should be smarter about
> this.  Given the choice between suppressing error messages and doing
> the right thing in the first place, I'd always go for the latter.

perhaps we can perform a check like this in main: if AI_ADDRCONFIG is not
supported AND ipv6 is not supported (e.g. creation of PF_INET6 sockets
fails or we don't have a global ipv6 address configured on one of the
interfaces), then enable --inet4-only.


> Could you please explain how defaulting to --inet4-only on systems
> that cannot connect to IPv6 breaks systems that support rfc3484?  It's
> not obvious to me -- surely IPv6 addresses would fail to work on such
> systems anyway?

sorry, i misexplained myself. enabling --inet4-only by default on systems
that do not support AI_ADDRCONFIG but have ipv6 connectivity is just like
not having ipv6 support at all. i wouldn't recommend adopting this
behaviour.

the problem with rfc3484 may arise if we don't use the sockaddr addresses
returned by getaddrinfo in order, but this is another problem.


-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
Deep Space 6 - IPv6 with Linux  http://www.deepspace6.net
Ferrara Linux User Grouphttp://www.ferrara.linux.it




Re: AI_ADDRCONFIG

2003-11-12 Thread Hrvoje Niksic
Mauro Tortonesi <[EMAIL PROTECTED]> writes:

>> I suppose I can work around the problem by specifying `inet4_only=yes'
>> in .wgetrc...
>>
>> Better yet, maybe we should make -4 the default on machines that don't
>> support AI_ADDRCONFIG and on which creating an AF_INET6 socket fails?
>
> IMHO, no. we should simply try in order each sockaddr address
> returned by getaddrinfo (if we don't, there can be problems in
> system which support RFC3484) and print an error message only if the
> verbosity option is turned on.

Yes, but Wget is not nc -- the verbosity option is on by default.  :-)

Also, the failed connect attempts potentially slow things down.  I
don't want Wget to try to connect to random IPv6 addresses -- it will
not work for me and it's just wrong.  Wget should be smarter about
this.  Given the choice between suppressing error messages and doing
the right thing in the first place, I'd always go for the latter.

Could you please explain how defaulting to --inet4-only on systems
that cannot connect to IPv6 breaks systems that support rfc3484?  It's
not obvious to me -- surely IPv6 addresses would fail to work on such
systems anyway?


Re: AI_ADDRCONFIG

2003-11-12 Thread Hrvoje Niksic
Mauro Tortonesi <[EMAIL PROTECTED]> writes:

> On Wed, 12 Nov 2003, Hrvoje Niksic wrote:
>> "Mauro Tortonesi" <[EMAIL PROTECTED]> writes:
>>
>> >> Wget works well, but it looks ugly because my machine is not
>> >> configured for IPv6.
>> >>
>> >> According to OpenGroup's web site, AI_ADDRCONFIG flag should be of use
>> >> here.  Should I be worried that the getaddrinfo man page on my (RHL 9)
>> >> system doesn't mention AI_ADDRCONFIG?
>> >
>> > yes, that's why AI_ADDRCONFIG has been introduced. unfortunately,
>> > glibc does not support AI_ADDRCONFIG yet. you have to install
>> > libinet6 from the usagi kit:
>> >
>> > http://www.deepspace6.net/docs/best_ipv6_support.html
>>
>> OK.  Interestingly enough, nc6 doesn't seem to have this problem (or
>> it's not displaying the errors).
>
> that's because chris leisham and i have worked __A LOT__ in order to get
> nc6 work and do the RIGHT THING (TM) in every circumstance ;-)
>
>
>> I suppose I can work around the problem by specifying `inet4_only=yes'
>> in .wgetrc...
>>
>> Better yet, maybe we should make -4 the default on machines that don't
>> support AI_ADDRCONFIG and on which creating an AF_INET6 socket fails?
>
> IMHO, no. we should simply try in order each sockaddr address returned by
> getaddrinfo (if we don't, there can be problems in system which support
> RFC3484) and print an error message only if the verbosity option is
> turned on.
>
>
> BTW: i have moved the discussion on the list. sorry for not having done it
>  before, but i was in a hurry and i was answering from my
>  not-configured-at-all webmail account.
>
> -- 
> Aequam memento rebus in arduis servare mentem...
>
> Mauro Tortonesi [EMAIL PROTECTED]
> [EMAIL PROTECTED]
> [EMAIL PROTECTED]
> Deep Space 6 - IPv6 with Linux  http://www.deepspace6.net
> Ferrara Linux User Grouphttp://www.ferrara.linux.it


Re: AI_ADDRCONFIG

2003-11-12 Thread Mauro Tortonesi
On Wed, 12 Nov 2003, [iso-8859-2] Dra¾en Kaèar wrote:

> Hrvoje Niksic wrote:
>
> > According to OpenGroup's web site, AI_ADDRCONFIG flag should be of use
> > here.  Should I be worried that the getaddrinfo man page on my (RHL 9)
> > system doesn't mention AI_ADDRCONFIG?
>
> Yes. The end of OpenGroup's man page says:
>
>  IEEE Std 1003.1-2001/Cor 1-2002, item XSH/TC1/D6/20 is applied, making
>  changes for alignment with IPv6. These include the following:
>
>* Adding AI_V4MAPPED, AI_ALL, and AI_ADDRCONFIG to the allowed
>  values for the ai_flags field
>
> "Cor 1-2002" is corrigendum 1 for POSIX/SUSv3 and it's probably too new
> addition to be implemented, especially considering that no one implements
> the current POSIX without corrigendum yet. Even when some systems
> implement that flag for getaddrinfo, you'll want to run on systems which
> predate corrigendum 1.

IIRC, all *BSD systems support AI_ADDRCONFIG via the libinet6 library.
glibc does not support AI_ADDRCONFIG yet.

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
Deep Space 6 - IPv6 with Linux  http://www.deepspace6.net
Ferrara Linux User Grouphttp://www.ferrara.linux.it




Re: AI_ADDRCONFIG

2003-11-12 Thread Dražen Kačar
Hrvoje Niksic wrote:

> According to OpenGroup's web site, AI_ADDRCONFIG flag should be of use
> here.  Should I be worried that the getaddrinfo man page on my (RHL 9)
> system doesn't mention AI_ADDRCONFIG?

Yes. The end of OpenGroup's man page says:

 IEEE Std 1003.1-2001/Cor 1-2002, item XSH/TC1/D6/20 is applied, making
 changes for alignment with IPv6. These include the following:

   * Adding AI_V4MAPPED, AI_ALL, and AI_ADDRCONFIG to the allowed
 values for the ai_flags field

"Cor 1-2002" is corrigendum 1 for POSIX/SUSv3 and it's probably too new
addition to be implemented, especially considering that no one implements
the current POSIX without corrigendum yet. Even when some systems
implement that flag for getaddrinfo, you'll want to run on systems which
predate corrigendum 1.

-- 
 .-.   .-.Yes, I am an agent of Satan, but my duties are largely
(_  \ /  _)   ceremonial.
 |
 |[EMAIL PROTECTED]


Re: AI_ADDRCONFIG

2003-11-12 Thread Mauro Tortonesi
On Wed, 12 Nov 2003, Hrvoje Niksic wrote:

> "Mauro Tortonesi" <[EMAIL PROTECTED]> writes:
>
> >> Wget works well, but it looks ugly because my machine is not
> >> configured for IPv6.
> >>
> >> According to OpenGroup's web site, AI_ADDRCONFIG flag should be of use
> >> here.  Should I be worried that the getaddrinfo man page on my (RHL 9)
> >> system doesn't mention AI_ADDRCONFIG?
> >
> > yes, that's why AI_ADDRCONFIG has been introduced. unfortunately,
> > glibc does not support AI_ADDRCONFIG yet. you have to install
> > libinet6 from the usagi kit:
> >
> > http://www.deepspace6.net/docs/best_ipv6_support.html
>
> OK.  Interestingly enough, nc6 doesn't seem to have this problem (or
> it's not displaying the errors).

that's because chris leisham and i have worked __A LOT__ in order to get
nc6 work and do the RIGHT THING (TM) in every circumstance ;-)


> I suppose I can work around the problem by specifying `inet4_only=yes'
> in .wgetrc...
>
> Better yet, maybe we should make -4 the default on machines that don't
> support AI_ADDRCONFIG and on which creating an AF_INET6 socket fails?

IMHO, no. we should simply try in order each sockaddr address returned by
getaddrinfo (if we don't, there can be problems in system which support
RFC3484) and print an error message only if the verbosity option is
turned on.


BTW: i have moved the discussion on the list. sorry for not having done it
 before, but i was in a hurry and i was answering from my
 not-configured-at-all webmail account.

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
Deep Space 6 - IPv6 with Linux  http://www.deepspace6.net
Ferrara Linux User Grouphttp://www.ferrara.linux.it