Re: How to use wget with option -p without writing files to disk?
Jens Schleusener <[EMAIL PROTECTED]> writes: > But that doesn't work since wget probably needs the downloaded pages > to find the files necessary to properly display the complete HTML > page. Exactly. Sorry about that; it will be fixed in a future release. Currently the only workaround is to download to a RAM disk or tmpfs file system, or anything that is clearly faster than your net connection, so that the writing time does not enter into account.
How to use wget with option -p without writing files to disk?
Hi, I just want to use wget (v1.9.1-rc1) to do some simple access-time benchmarking of some WWW pages. So I first started with wget --page-requisites --timeout=30 --proxy=off \ --tries=1 \ http://www.foo.bar/ (last output line for e.g.: Downloaded: 76,431 bytes in 27 files) But then I remarked that in this way I also measured the disk I/O while writing the fetched files to the local disk. So the next idea was to let write "wget" the output to /dev/null (option --tries=1 omitted since it's the default using --output-document) wget --page-requisites --timeout=30 --proxy=off \ --output-document=/dev/null \ http://www.foo.bar/ (last output line for e.g.: Downloaded: 31,999 bytes in 2 files) But that doesn't work since wget probably needs the downloaded pages to find the files necessary to properly display the complete HTML page. A workaround seems to call wget once in the standard way so the files are locally available but that probably wouldn't work correctly if the benchmarked page were be changed. Any ideas to that correctly with wget? Or any pointers to more appropriate tools? Greetings Jens -- Dr. Jens SchleusenerT-Systems Solutions for Research GmbH Tel: +49 551 709-2493 Bunsenstr.10 Fax: +49 551 709-2169 D-37073 Goettingen [EMAIL PROTECTED] http://www.t-systems.com/
Re: AI_ADDRCONFIG
Mauro Tortonesi <[EMAIL PROTECTED]> writes: > i wouldn't do i at configure time because compilation would then be > prone to some problems which may be difficult to find out. for > example, what if you have compiled wget without loading the ipv6 > module, but your system supports PF_INET6 sockets and you want wget > to have ipv6 support? or what if you have compiled wget with the the > ipv6 module loaded but normally your system has ipv6 support turned > off? The nice thing about Wget's --inet4-only switch and the corresponding .wgetrc setting is that they can be reverted with --no-inet4-only. So in theory, the user could use --no-inet4 to undo the problem. However, I agree that this is still suboptimal. So let's add the socket creation check in main(). The check will only occur on systems *with* IPv6 in libc, but *without* AI_ADDRCONFIG. The number of those will dwindle as IPv6 gets more widely implemented, so even that minimal inefficiency is not here to stay. Interestingly enough, the glibc I installed from Rawhide (while Rawhide still existed) does support AI_ADDRCONFIG, AI_V4MAPPED, and AI_ALL, at least according to Wget's configure. The version of glibc is "2.3.2-82", but I don't know if the IPv6 stuff is native to that version or if it was added by Red Hat's packagers. >> > the problem with rfc3484 may arise if we don't use the sockaddr >> > addresses returned by getaddrinfo in order, but this is another >> > problem. >> >> From what I can tell, we'll always use them in order, so we should >> be safe. > > yes, let's keep using this policy. I'll explicitly document this in the docstring of lookup_host, so that it's clear that the preserved ordering is not an artifact of the current implementation. NB, I believe Ari's IPv6 patch posted to wget-patches contained code that sorted the address list. I didn't apply that patch because the CVS Wget already had support for dual-family systems, but it would indicate that there is a certain temptation to reorder the results returned by getaddrinfo, and *that* can lead to conflicts with rfc3484.
Re: AI_ADDRCONFIG
On Wed, 12 Nov 2003, Hrvoje Niksic wrote: > Mauro Tortonesi <[EMAIL PROTECTED]> writes: > > > perhaps we can perform a check like this in main: if AI_ADDRCONFIG > > is not supported AND ipv6 is not supported (e.g. creation of > > PF_INET6 sockets fails or we don't have a global ipv6 address > > configured on one of the interfaces), then enable --inet4-only. > > That is exactly what I was proposing (see the "Better yet..." > sentence). > > Could we push that check to configure time, so that every call to > main() doesn't needlessly create a socket? But then the binary built > on an IPv6-less system would have a strange default when transferred > to a system with working IPv6. i wouldn't do i at configure time because compilation would then be prone to some problems which may be difficult to find out. for example, what if you have compiled wget without loading the ipv6 module, but your system supports PF_INET6 sockets and you want wget to have ipv6 support? or what if you have compiled wget with the the ipv6 module loaded but normally your system has ipv6 support turned off? > > the problem with rfc3484 may arise if we don't use the sockaddr > > addresses returned by getaddrinfo in order, but this is another > > problem. > > From what I can tell, we'll always use them in order, so we should be > safe. yes, let's keep using this policy. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] Deep Space 6 - IPv6 with Linux http://www.deepspace6.net Ferrara Linux User Grouphttp://www.ferrara.linux.it
Re: keep alive connections
Alain Bench <[EMAIL PROTECTED]> writes: > OK, wasn't aware of the spurious HEAD bodies problem. But Wget also > closes the connection between a GET (with body) and the HEAD for the > next file. Could you post a URL for which this happens? I wasn't aware of this problem and would like to fix it. >> But maybe it would actually be a better idea to read (and discard) >> the body than to close the connection and reopen it. > > Hum... Would it be possible to close/reopen only if, and as soon as, > first byte of spurious body comes? This is harder than it seems. How exactly do you propose to detect the unwanted body? If you wait for an arbitrary time for the body data to start arriving, you slow down all downloads and defeat the purpose of the persistent connections (speed). If you don't wait, the detection doesn't work because the body data can start arriving a bit later (which is frequently the case with CGI's). Either case, you lose. What Wget does only sacrifices persistent connections at times, but does the right thing with all kinds of responses and doesn't introduce artificial delays. >>>| Keep-Alive: timeout=15, max=5 >>> Without --timestamping Wget keeps "Reusing fd 3." and closing it only >>> once every 6 files (first + 5 more). >> This might be due to redirections. > > No redirections involved: That closure is normal, due to the "max=5" > the server responds to the first request. At second GET it's "max=4" and > gets decremented each time. Finally at the 6th request there is no more > "Connection:" nor "Keep-Alive:" fields. Oh, I see, it's a server setting. Why do they use such a limit?
Re: AI_ADDRCONFIG
Mauro Tortonesi <[EMAIL PROTECTED]> writes: > perhaps we can perform a check like this in main: if AI_ADDRCONFIG > is not supported AND ipv6 is not supported (e.g. creation of > PF_INET6 sockets fails or we don't have a global ipv6 address > configured on one of the interfaces), then enable --inet4-only. That is exactly what I was proposing (see the "Better yet..." sentence). Could we push that check to configure time, so that every call to main() doesn't needlessly create a socket? But then the binary built on an IPv6-less system would have a strange default when transferred to a system with working IPv6. > the problem with rfc3484 may arise if we don't use the sockaddr > addresses returned by getaddrinfo in order, but this is another > problem. >From what I can tell, we'll always use them in order, so we should be safe.
Re: keep alive connections
On Tuesday, November 11, 2003 at 2:41:31 PM +0100, Hrvoje Niksic wrote: > Alain Bench <[EMAIL PROTECTED]> writes: >> with --timestamping: Each HEAD and each possible GET uses a new >> connection. > I think the difference is that Wget closes the connection when it > decides not to read the request body. OK, wasn't aware of the spurious HEAD bodies problem. But Wget also closes the connection between a GET (with body) and the HEAD for the next file. > But maybe it would actually be a better idea to read (and discard) the > body than to close the connection and reopen it. Hum... Would it be possible to close/reopen only if, and as soon as, first byte of spurious body comes? I guess it could be difficult to deal cleanly with next file in limit cases... >>| Keep-Alive: timeout=15, max=5 >> Without --timestamping Wget keeps "Reusing fd 3." and closing it only >> once every 6 files (first + 5 more). > This might be due to redirections. No redirections involved: That closure is normal, due to the "max=5" the server responds to the first request. At second GET it's "max=4" and gets decremented each time. Finally at the 6th request there is no more "Connection:" nor "Keep-Alive:" fields. The /etc/apache/httpd.conf says: | # KeepAlive: The number of Keep-Alive persistent requests to accept | # per connection. Set to 0 to deactivate Keep-Alive support | KeepAlive 5 | | # KeepAliveTimeout: Number of seconds to wait for the next request | KeepAliveTimeout 15 Bye!Alain. -- When you want to reply to a mailing list, please avoid doing so from a digest. This often builds incorrect references and breaks threads.
Re: AI_ADDRCONFIG
On Wed, 12 Nov 2003, Hrvoje Niksic wrote: > Mauro Tortonesi <[EMAIL PROTECTED]> writes: > > >> I suppose I can work around the problem by specifying `inet4_only=yes' > >> in .wgetrc... > >> > >> Better yet, maybe we should make -4 the default on machines that don't > >> support AI_ADDRCONFIG and on which creating an AF_INET6 socket fails? > > > > IMHO, no. we should simply try in order each sockaddr address > > returned by getaddrinfo (if we don't, there can be problems in > > system which support RFC3484) and print an error message only if the > > verbosity option is turned on. > > Yes, but Wget is not nc -- the verbosity option is on by default. :-) > > Also, the failed connect attempts potentially slow things down. I > don't want Wget to try to connect to random IPv6 addresses -- it will > not work for me and it's just wrong. Wget should be smarter about > this. Given the choice between suppressing error messages and doing > the right thing in the first place, I'd always go for the latter. perhaps we can perform a check like this in main: if AI_ADDRCONFIG is not supported AND ipv6 is not supported (e.g. creation of PF_INET6 sockets fails or we don't have a global ipv6 address configured on one of the interfaces), then enable --inet4-only. > Could you please explain how defaulting to --inet4-only on systems > that cannot connect to IPv6 breaks systems that support rfc3484? It's > not obvious to me -- surely IPv6 addresses would fail to work on such > systems anyway? sorry, i misexplained myself. enabling --inet4-only by default on systems that do not support AI_ADDRCONFIG but have ipv6 connectivity is just like not having ipv6 support at all. i wouldn't recommend adopting this behaviour. the problem with rfc3484 may arise if we don't use the sockaddr addresses returned by getaddrinfo in order, but this is another problem. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] Deep Space 6 - IPv6 with Linux http://www.deepspace6.net Ferrara Linux User Grouphttp://www.ferrara.linux.it
Re: AI_ADDRCONFIG
Mauro Tortonesi <[EMAIL PROTECTED]> writes: >> I suppose I can work around the problem by specifying `inet4_only=yes' >> in .wgetrc... >> >> Better yet, maybe we should make -4 the default on machines that don't >> support AI_ADDRCONFIG and on which creating an AF_INET6 socket fails? > > IMHO, no. we should simply try in order each sockaddr address > returned by getaddrinfo (if we don't, there can be problems in > system which support RFC3484) and print an error message only if the > verbosity option is turned on. Yes, but Wget is not nc -- the verbosity option is on by default. :-) Also, the failed connect attempts potentially slow things down. I don't want Wget to try to connect to random IPv6 addresses -- it will not work for me and it's just wrong. Wget should be smarter about this. Given the choice between suppressing error messages and doing the right thing in the first place, I'd always go for the latter. Could you please explain how defaulting to --inet4-only on systems that cannot connect to IPv6 breaks systems that support rfc3484? It's not obvious to me -- surely IPv6 addresses would fail to work on such systems anyway?
Re: AI_ADDRCONFIG
Mauro Tortonesi <[EMAIL PROTECTED]> writes: > On Wed, 12 Nov 2003, Hrvoje Niksic wrote: >> "Mauro Tortonesi" <[EMAIL PROTECTED]> writes: >> >> >> Wget works well, but it looks ugly because my machine is not >> >> configured for IPv6. >> >> >> >> According to OpenGroup's web site, AI_ADDRCONFIG flag should be of use >> >> here. Should I be worried that the getaddrinfo man page on my (RHL 9) >> >> system doesn't mention AI_ADDRCONFIG? >> > >> > yes, that's why AI_ADDRCONFIG has been introduced. unfortunately, >> > glibc does not support AI_ADDRCONFIG yet. you have to install >> > libinet6 from the usagi kit: >> > >> > http://www.deepspace6.net/docs/best_ipv6_support.html >> >> OK. Interestingly enough, nc6 doesn't seem to have this problem (or >> it's not displaying the errors). > > that's because chris leisham and i have worked __A LOT__ in order to get > nc6 work and do the RIGHT THING (TM) in every circumstance ;-) > > >> I suppose I can work around the problem by specifying `inet4_only=yes' >> in .wgetrc... >> >> Better yet, maybe we should make -4 the default on machines that don't >> support AI_ADDRCONFIG and on which creating an AF_INET6 socket fails? > > IMHO, no. we should simply try in order each sockaddr address returned by > getaddrinfo (if we don't, there can be problems in system which support > RFC3484) and print an error message only if the verbosity option is > turned on. > > > BTW: i have moved the discussion on the list. sorry for not having done it > before, but i was in a hurry and i was answering from my > not-configured-at-all webmail account. > > -- > Aequam memento rebus in arduis servare mentem... > > Mauro Tortonesi [EMAIL PROTECTED] > [EMAIL PROTECTED] > [EMAIL PROTECTED] > Deep Space 6 - IPv6 with Linux http://www.deepspace6.net > Ferrara Linux User Grouphttp://www.ferrara.linux.it
Re: AI_ADDRCONFIG
On Wed, 12 Nov 2003, [iso-8859-2] Dra¾en Kaèar wrote: > Hrvoje Niksic wrote: > > > According to OpenGroup's web site, AI_ADDRCONFIG flag should be of use > > here. Should I be worried that the getaddrinfo man page on my (RHL 9) > > system doesn't mention AI_ADDRCONFIG? > > Yes. The end of OpenGroup's man page says: > > IEEE Std 1003.1-2001/Cor 1-2002, item XSH/TC1/D6/20 is applied, making > changes for alignment with IPv6. These include the following: > >* Adding AI_V4MAPPED, AI_ALL, and AI_ADDRCONFIG to the allowed > values for the ai_flags field > > "Cor 1-2002" is corrigendum 1 for POSIX/SUSv3 and it's probably too new > addition to be implemented, especially considering that no one implements > the current POSIX without corrigendum yet. Even when some systems > implement that flag for getaddrinfo, you'll want to run on systems which > predate corrigendum 1. IIRC, all *BSD systems support AI_ADDRCONFIG via the libinet6 library. glibc does not support AI_ADDRCONFIG yet. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] Deep Space 6 - IPv6 with Linux http://www.deepspace6.net Ferrara Linux User Grouphttp://www.ferrara.linux.it
Re: AI_ADDRCONFIG
Hrvoje Niksic wrote: > According to OpenGroup's web site, AI_ADDRCONFIG flag should be of use > here. Should I be worried that the getaddrinfo man page on my (RHL 9) > system doesn't mention AI_ADDRCONFIG? Yes. The end of OpenGroup's man page says: IEEE Std 1003.1-2001/Cor 1-2002, item XSH/TC1/D6/20 is applied, making changes for alignment with IPv6. These include the following: * Adding AI_V4MAPPED, AI_ALL, and AI_ADDRCONFIG to the allowed values for the ai_flags field "Cor 1-2002" is corrigendum 1 for POSIX/SUSv3 and it's probably too new addition to be implemented, especially considering that no one implements the current POSIX without corrigendum yet. Even when some systems implement that flag for getaddrinfo, you'll want to run on systems which predate corrigendum 1. -- .-. .-.Yes, I am an agent of Satan, but my duties are largely (_ \ / _) ceremonial. | |[EMAIL PROTECTED]
Re: AI_ADDRCONFIG
On Wed, 12 Nov 2003, Hrvoje Niksic wrote: > "Mauro Tortonesi" <[EMAIL PROTECTED]> writes: > > >> Wget works well, but it looks ugly because my machine is not > >> configured for IPv6. > >> > >> According to OpenGroup's web site, AI_ADDRCONFIG flag should be of use > >> here. Should I be worried that the getaddrinfo man page on my (RHL 9) > >> system doesn't mention AI_ADDRCONFIG? > > > > yes, that's why AI_ADDRCONFIG has been introduced. unfortunately, > > glibc does not support AI_ADDRCONFIG yet. you have to install > > libinet6 from the usagi kit: > > > > http://www.deepspace6.net/docs/best_ipv6_support.html > > OK. Interestingly enough, nc6 doesn't seem to have this problem (or > it's not displaying the errors). that's because chris leisham and i have worked __A LOT__ in order to get nc6 work and do the RIGHT THING (TM) in every circumstance ;-) > I suppose I can work around the problem by specifying `inet4_only=yes' > in .wgetrc... > > Better yet, maybe we should make -4 the default on machines that don't > support AI_ADDRCONFIG and on which creating an AF_INET6 socket fails? IMHO, no. we should simply try in order each sockaddr address returned by getaddrinfo (if we don't, there can be problems in system which support RFC3484) and print an error message only if the verbosity option is turned on. BTW: i have moved the discussion on the list. sorry for not having done it before, but i was in a hurry and i was answering from my not-configured-at-all webmail account. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] Deep Space 6 - IPv6 with Linux http://www.deepspace6.net Ferrara Linux User Grouphttp://www.ferrara.linux.it