Re: wget through proxy is slow (Connection:Keep-Alive ignored?)

2008-01-10 Thread Yazeed Hamid
On Jan 8, 2008 12:14 AM, L Walsh <[EMAIL PROTECTED]> wrote:

> Yazeed Hamid wrote:
> > I'm using wget version 1.10.2 on cygwin running on Windows Vista (v
> > 6.0.6000 Build 6000).
> > When there is a proxy to go through, the corresponsent proxy
> > address:port is exported to the environment (export http_proxy=$proxy")
> >
> > The problem is, when wget is working through a proxy, it doesn't seem to
> > reuse the existing
> > http connection like it does when --proxy=off is set. When there is no
> > proxy, all objects
> > referenced in an html file are fetched over the same connection. On the
> > other hand, when I am
> > going through a proxy, each object is fetched in a new connection
> > although the Connection: Keep-Alive header
> > is both in the http request and response messages.
> >
> > As a result, the measured response time through a proxy is very
> > much greater than that through direct connection .
> 
>I just tried both of your examples to mcfee.com (taking 3.56 and
> 20.42
> seconds).  My proxy server is on a linux machine, so I had to run my tests
> from linux.  I couldn't replicate your problem.  My output:
>
> > time wget -pdEk --delete-after --proxy=off -o log-no-proxy-w-debug
> www.mcafee.com
> Setting --html-extension (htmlextension) to 1;
> Setting --convert-links (convertlinks) to 1
> Setting --delete-after (deleteafter) to 1
> Setting --proxy (useproxy) to off
> Setting --output-file (logfile) to log-no-proxy-w-debug
> 2.62sec 0.01usr 0.02sys (1.44% cpu)
> > time wget -pdEk --delete-after --proxy=on -o log-no-proxy-w-debug
> www.mcafee.com
> Setting --html-extension (htmlextension) to 1
> Setting --convert-links (convertlinks) to 1
> Setting --delete-after (deleteafter) to 1
> Setting --proxy (useproxy) to on
> Setting --output-file (logfile) to log-no-proxy-w-debug
> 2.56sec 0.02usr 0.03sys (2.14% cpu)
>
>You are running on Windows.  MS networking isn't known for its
> speed -- especially on open/close operations, but I wouldn't think it
> would be that bad.  They deliberately put in slowdowns on non-server
> editions of Windows starting in XP-SP2 on opening some types
> of network connections -- that could be part of the cause -- but
> again, a 17 second delay seems unreasonable.
>
>Also depends on what your proxy server does.  I know
> squid has parameters (persistent_request_timeout, client_lifetime,
> pconn_timeout) to set the timeout for re-usable connections.
> While the defaults in a standard 'squid' setup are reasonable,
> You didn't specify what proxy you were using nor do we know how
> it is configured.
>
>For What Its Worth -- I tried the wget statement on
> my Windows-xp box.  I only tested the 'with-proxy' case, since
> my windows box isn't on the external net (has to go through
> the linux proxy).  It came out with times similar to those
> run on the proxy machine: 2.68sec 0.04usr 0.10sys (5.72% cpu)
>
>I'd check the proxy.  My linux wget-1.10.1, and windows
> wget (under cygwin) = 1.10.2.
>
>Good luck.
>

Thank you for your concern. I have attempted the same test on a Linux box
(Red Hat Enterprise Linux 5)
on the same network and the result was similar (even a little slower). I
must check
our proxy configuration. Thank you once again.


Re: wget through proxy is slow (Connection:Keep-Alive ignored?)

2008-01-07 Thread L Walsh

Yazeed Hamid wrote:
I'm using wget version 1.10.2 on cygwin running on Windows Vista (v 
6.0.6000 Build 6000).
When there is a proxy to go through, the corresponsent proxy 
address:port is exported to the environment (export http_proxy=$proxy")
 
The problem is, when wget is working through a proxy, it doesn't seem to 
reuse the existing
http connection like it does when --proxy=off is set. When there is no 
proxy, all objects
referenced in an html file are fetched over the same connection. On the 
other hand, when I am
going through a proxy, each object is fetched in a new connection 
although the Connection: Keep-Alive header

is both in the http request and response messages.
 
As a result, the measured response time through a proxy is very 
much greater than that through direct connection .


I just tried both of your examples to mcfee.com (taking 3.56 and 20.42
seconds).  My proxy server is on a linux machine, so I had to run my tests
from linux.  I couldn't replicate your problem.  My output:

time wget -pdEk --delete-after --proxy=off -o log-no-proxy-w-debug www.mcafee.com  

Setting --html-extension (htmlextension) to 1;
Setting --convert-links (convertlinks) to 1
Setting --delete-after (deleteafter) to 1
Setting --proxy (useproxy) to off
Setting --output-file (logfile) to log-no-proxy-w-debug
2.62sec 0.01usr 0.02sys (1.44% cpu)
time wget -pdEk --delete-after --proxy=on -o log-no-proxy-w-debug www.mcafee.com   

Setting --html-extension (htmlextension) to 1
Setting --convert-links (convertlinks) to 1
Setting --delete-after (deleteafter) to 1
Setting --proxy (useproxy) to on
Setting --output-file (logfile) to log-no-proxy-w-debug
2.56sec 0.02usr 0.03sys (2.14% cpu)

You are running on Windows.  MS networking isn't known for its
speed -- especially on open/close operations, but I wouldn't think it
would be that bad.  They deliberately put in slowdowns on non-server
editions of Windows starting in XP-SP2 on opening some types
of network connections -- that could be part of the cause -- but
again, a 17 second delay seems unreasonable.

Also depends on what your proxy server does.  I know
squid has parameters (persistent_request_timeout, client_lifetime,
pconn_timeout) to set the timeout for re-usable connections.
While the defaults in a standard 'squid' setup are reasonable,
You didn't specify what proxy you were using nor do we know how
it is configured.

For What Its Worth -- I tried the wget statement on
my Windows-xp box.  I only tested the 'with-proxy' case, since
my windows box isn't on the external net (has to go through
the linux proxy).  It came out with times similar to those
run on the proxy machine: 2.68sec 0.04usr 0.10sys (5.72% cpu)

I'd check the proxy.  My linux wget-1.10.1, and windows
wget (under cygwin) = 1.10.2.

Good luck.


Re: wget through proxy is slow (Connection:Keep-Alive ignored?)

2008-01-07 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Yazeed Hamid wrote:
> Hi to all.
> Thank you very much for your efforts and a happy new year to all.
>  
> I have been using wget in a bash script to measure website response
> times over different proxy configurations.
>  
> /usr/bin/time "%e Seconds" /usr/bin/wget -pEkq
> --delete-after --proxy=$switch $URL[$index]
>  
> I'm using wget version 1.10.2 on cygwin running on Windows Vista (v
> 6.0.6000 Build 6000).
> When there is a proxy to go through, the corresponsent proxy
> address:port is exported to the environment (export http_proxy=$proxy")
>  
> The problem is, when wget is working through a proxy, it doesn't seem to
> reuse the existing
> http connection like it does when --proxy=off is set. When there is no
> proxy, all objects
> referenced in an html file are fetched over the same connection. On the
> other hand, when I am
> going through a proxy, each object is fetched in a new connection
> although the Connection: Keep-Alive header
> is both in the http request and response messages.
>  
> As a result, the measured response time through a proxy is very
> much greater than that through direct connection .
>  
> $ /usr/bin/time -f "%e Seconds" wget -pdEk --delete-after --proxy=off -o
> log-no-proxy-w-debug www.mcafee.com 
> 3.56 Seconds
>  
> $ /usr/bin/time -f "%e Seconds" wget -pdEk --delete-after --proxy=on -o
> log-thru-proxy-w-debug www.mcafee.com 
> 20.42 Seconds
>  
> How can I make wget make sense of the Connection: Keep-Alive message
> when going through
> a proxy and thus reuse the existing connection like it does when
> directly connecting to the web server?
> In an early post, I read something about modifying the source of wget,
> is this the only solution?

Probably.

It does look like you're right that Wget is dropping the connection;
it'd be more certain if i could see tcpdump output indicating which side
closed the connection first; but IIRC the debug messages are different
when the remote side closes the connection first (I haven't time to
check the source to see for myself just now).

I'll file an issue for this, but it's not likely to be addressed anytime
soon.

You mentioned another message that talked about modifying the source; do
you have a reference to it? If someone has already found an appropriate
patch for this, it can be in much quicker than if you wait for me to get
around to it. ;)

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHgm377M8hyUobTrERAoYqAJ9j8M6BNbzjE3oD8iMNCAdjU1eh8gCfdhlj
8Q7itdzlklBbyazbI3wS5iQ=
=KlHm
-END PGP SIGNATURE-