Moin!

Problem: ssl-linked wget spans hosts even when it shouldn't when encountering
a https:// link:
-----------------------------------------------------------------------------
Deciding whether to enqueue "http://www.egalwashierstehterversuchtesnichtzuladen.de/";.
This is not the same hostname as the parent's 
(www.egalwashierstehterversuchtesnichtzuladen.de and www.t
igress.com).
Decided NOT to load it.
Deciding whether to enqueue "https://www.egalwashierstehterversuchteszuladen.de/";.
Decided to load it.
Enqueuing https://www.egalwashierstehterversuchteszuladen.de/ at depth 1
Queue count 1, maxcount 1.
-----------------------------------------------------------------------------
Non-ssl wget rejects url a stage earlier with 'www.tigress.com/mow/bla/index.html: 
merged link "https://www.egalwashierstehterversuchteszuladen.de/"; doesn't parse.'

Proof:
First wget with ssl, which has the problem:
-----------------------------------------------------------------------------
good:/samba-shares/mow/Bilder/Furry/Silber/test# ldd `which wget`
        libssl.so.0 => /usr/lib/libssl.so.0 (0x4001d000)
        libcrypto.so.0 => /usr/lib/libcrypto.so.0 (0x400d3000)
        libdl.so.2 => /lib/libdl.so.2 (0x40193000)
        libc.so.6 => /lib/libc.so.6 (0x40197000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

uname -a
Linux good 2.2.20 #1 SMP Fri Feb 8 04:43:25 CET 2002 i686 unknown

System: Slackware 8.0

good:/samba-shares/mow/Bilder/Furry/Silber/test# wget -r -l 3 -np -d 
"http://www.tigress.com/mow/bla/";
DEBUG output created by Wget 1.8.2 on linux-gnu.

Enqueuing http://www.tigress.com/mow/bla/ at depth 0
Queue count 1, maxcount 1.
Dequeuing http://www.tigress.com/mow/bla/ at depth 0
Queue count 0, maxcount 1.
--23:07:32--  http://www.tigress.com/mow/bla/
           => `www.tigress.com/mow/bla/index.html'
Resolving www-proxy.hh1.srv.t-online.de... done.
Caching www-proxy.hh1.srv.t-online.de => 212.185.253.70
Connecting to www-proxy.hh1.srv.t-online.de[212.185.253.70]:80... connected.
Created socket 3.
Releasing 0x807e168 (new refcount 1).
---request begin---
GET http://www.tigress.com/mow/bla/ HTTP/1.0
User-Agent: Wget/1.8.2
Host: www.tigress.com
Accept: */*

---request end---
Proxy request sent, awaiting response... HTTP/1.1 200 OK
Via: HTTP/1.1 spcss45. (IBM-PROXY-WTE-US), HTTP/1.1 spcss43. (IBM-PROXY-WTE-US)
Date: Fri, 24 Jan 2003 22:07:24 GMT
Server: Apache/1.3.26 (Unix) mod_ssl/2.8.9 OpenSSL/0.9.5a mod_jk PHP/4.1.2
Content-Type: text/html
Content-Length: 2242


Length: 2,242 [text/html]

    0K ..                                                    100%  199.04 KB/s

Closing fd 3
23:07:32 (199.04 KB/s) - `www.tigress.com/mow/bla/index.html' saved [2242/2242]

Loaded www.tigress.com/mow/bla/index.html (size 2242).
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"/html/tigress.css") -> http://www.tigress.com/html/tigress.css
appending "http://www.tigress.com/html/tigress.css"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"/images/navigator") -> http://www.tigress.com/images/navigator
appending "http://www.tigress.com/images/navigator"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"/images/bow") -> http://www.tigress.com/images/bow
appending "http://www.tigress.com/images/bow"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, "/html/") 
-> http://www.tigress.com/html/
appending "http://www.tigress.com/html/"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"/html/furryorgs") -> http://www.tigress.com/html/furryorgs
appending "http://www.tigress.com/html/furryorgs"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"/html/furryservers") -> http://www.tigress.com/html/furryservers
appending "http://www.tigress.com/html/furryservers"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"/html/about") -> http://www.tigress.com/html/about
appending "http://www.tigress.com/html/about"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"/stats/") -> http://www.tigress.com/stats/
appending "http://www.tigress.com/stats/"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, "/") -> 
http://www.tigress.com/
appending "http://www.tigress.com/"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, "/") -> 
http://www.tigress.com/
appending "http://www.tigress.com/"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"http://www.egalwashierstehterversuchtesnichtzuladen.de/";) -> 
http://www.egalwashierstehterversuchtesnichtzuladen.de/
appending "http://www.egalwashierstehterversuchtesnichtzuladen.de/"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"https://www.egalwashierstehterversuchteszuladen.de/";) -> 
https://www.egalwashierstehterversuchteszuladen.de/
appending "https://www.egalwashierstehterversuchteszuladen.de/"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"/banner/banner.cgi?ACTION=2&NAME=tigress&LINK=0") -> 
http://www.tigress.com/banner/banner.cgi?ACTION=2&NAME=tigress&LINK=0
appending "http://www.tigress.com/banner/banner.cgi?ACTION=2&NAME=tigress&LINK=0"; to 
urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"http://www.yatho.com/banner/images/b-yatho.gif";) -> 
http://www.yatho.com/banner/images/b-yatho.gif
appending "http://www.yatho.com/banner/images/b-yatho.gif"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"/html/setlocation?dummy") -> http://www.tigress.com/html/setlocation?dummy
appending "http://www.tigress.com/html/setlocation?dummy"; to urlpos.
no-follow in www.tigress.com/mow/bla/index.html: 0
Deciding whether to enqueue "http://www.tigress.com/html/tigress.css";.
Going to "html" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.tigress.com/images/navigator";.
Going to "images" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.tigress.com/images/bow";.
Going to "images" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.tigress.com/html/";.
Going to "html" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.tigress.com/html/furryorgs";.
Going to "html" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.tigress.com/html/furryservers";.
Going to "html" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.tigress.com/html/about";.
Going to "html" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.tigress.com/stats/";.
Going to "stats" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.tigress.com/";.
Going to "" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.tigress.com/";.
Going to "" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.egalwashierstehterversuchtesnichtzuladen.de/";.
This is not the same hostname as the parent's 
(www.egalwashierstehterversuchtesnichtzuladen.de and www.tigress.com).
Decided NOT to load it.
Deciding whether to enqueue "https://www.egalwashierstehterversuchteszuladen.de/";.
Decided to load it.
Enqueuing https://www.egalwashierstehterversuchteszuladen.de/ at depth 1
Queue count 1, maxcount 1.
Deciding whether to enqueue 
"http://www.tigress.com/banner/banner.cgi?ACTION=2&NAME=tigress&LINK=0";.
Going to "banner" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.yatho.com/banner/images/b-yatho.gif";.
This is not the same hostname as the parent's (www.yatho.com and www.tigress.com).
Decided NOT to load it.
Deciding whether to enqueue "http://www.tigress.com/html/setlocation?dummy";.
Going to "html" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Dequeuing https://www.egalwashierstehterversuchteszuladen.de/ at depth 1
Queue count 0, maxcount 1.
--23:07:32--  https://www.egalwashierstehterversuchteszuladen.de/
           => `www.egalwashierstehterversuchteszuladen.de/index.html'
Resolving www.egalwashierstehterversuchteszuladen.de... failed: Host not found.

FINISHED --23:07:32--
Downloaded: 2,242 bytes in 1 files

-----------------------------------------------------------------------------

Second, wget without ssl which hasn't:
-----------------------------------------------------------------------------
mail[mow]:~/laber> ldd `which wget`
        libc.so.6 => /lib/libc.so.6 (0x40018000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
mail[mow]:~/laber> wget -r -l 3 -np -d "http://www.tigress.com/mow/bla/";
DEBUG output created by Wget 1.8.2 on linux-gnu.

Enqueuing http://www.tigress.com/mow/bla/ at depth 0
Queue count 1, maxcount 1.
Dequeuing http://www.tigress.com/mow/bla/ at depth 0
Queue count 0, maxcount 1.
--23:10:35--  http://www.tigress.com/mow/bla/
           => `www.tigress.com/mow/bla/index.html'
Resolving www.tigress.com... done.
Caching www.tigress.com => 213.200.97.76
Connecting to www.tigress.com[213.200.97.76]:80... connected.
Created socket 3.
Releasing 0x8077148 (new refcount 1).
---request begin---
GET /mow/bla/ HTTP/1.0
User-Agent: Wget/1.8.2
Host: www.tigress.com
Accept: */*
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... HTTP/1.1 200 OK
Date: Fri, 24 Jan 2003 22:10:35 GMT
Server: Apache/1.3.26 (Unix) mod_ssl/2.8.9 OpenSSL/0.9.5a mod_jk PHP/4.1.2
Connection: close
Content-Type: text/html


Length: unspecified [text/html]

    [ <=>                                                         ] 2,241          
2.14M/s

Closing fd 3
23:10:36 (2.14 MB/s) - `www.tigress.com/mow/bla/index.html' saved [2241]

Loaded www.tigress.com/mow/bla/index.html (size 2241).
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"/html/tigress.css") -> http://www.tigress.com/html/tigress.css
appending "http://www.tigress.com/html/tigress.css"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"/images/navigator") -> http://www.tigress.com/images/navigator
appending "http://www.tigress.com/images/navigator"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"/images/bow") -> http://www.tigress.com/images/bow
appending "http://www.tigress.com/images/bow"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, "/html/") 
-> http://www.tigress.com/html/
appending "http://www.tigress.com/html/"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"/html/furryorgs") -> http://www.tigress.com/html/furryorgs
appending "http://www.tigress.com/html/furryorgs"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"/html/furryservers") -> http://www.tigress.com/html/furryservers
appending "http://www.tigress.com/html/furryservers"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"/html/about") -> http://www.tigress.com/html/about
appending "http://www.tigress.com/html/about"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"/stats/") -> http://www.tigress.com/stats/
appending "http://www.tigress.com/stats/"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, "/") -> 
http://www.tigress.com/
appending "http://www.tigress.com/"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, "/") -> 
http://www.tigress.com/
appending "http://www.tigress.com/"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"http://www.egalwashierstehterversuchtesnichtzuladen.de/";) -> 
http://www.egalwashierstehterversuchtesnichtzuladen.de/
appending "http://www.egalwashierstehterversuchtesnichtzuladen.de/"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"https://www.egalwashierstehterversuchteszuladen.de/";) -> 
https://www.egalwashierstehterversuchteszuladen.de/
www.tigress.com/mow/bla/index.html: merged link 
"https://www.egalwashierstehterversuchteszuladen.de/"; doesn't parse.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"/banner/banner.cgi?ACTION=2&NAME=tigress&LINK=1") -> 
http://www.tigress.com/banner/banner.cgi?ACTION=2&NAME=tigress&LINK=1
appending "http://www.tigress.com/banner/banner.cgi?ACTION=2&NAME=tigress&LINK=1"; to 
urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"http://www.yatho.com/banner/images/b-lion.gif";) -> 
http://www.yatho.com/banner/images/b-lion.gif
appending "http://www.yatho.com/banner/images/b-lion.gif"; to urlpos.
www.tigress.com/mow/bla/index.html: merge("http://www.tigress.com/mow/bla/";, 
"/html/setlocation?dummy") -> http://www.tigress.com/html/setlocation?dummy
appending "http://www.tigress.com/html/setlocation?dummy"; to urlpos.
no-follow in www.tigress.com/mow/bla/index.html: 0
Deciding whether to enqueue "http://www.tigress.com/html/tigress.css";.
Going to "html" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.tigress.com/images/navigator";.
Going to "images" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.tigress.com/images/bow";.
Going to "images" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.tigress.com/html/";.
Going to "html" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.tigress.com/html/furryorgs";.
Going to "html" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.tigress.com/html/furryservers";.
Going to "html" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.tigress.com/html/about";.
Going to "html" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.tigress.com/stats/";.
Going to "stats" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.tigress.com/";.
Going to "" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.tigress.com/";.
Going to "" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.egalwashierstehterversuchtesnichtzuladen.de/";.
This is not the same hostname as the parent's 
(www.egalwashierstehterversuchtesnichtzuladen.de and www.tigress.com).
Decided NOT to load it.
Deciding whether to enqueue 
"http://www.tigress.com/banner/banner.cgi?ACTION=2&NAME=tigress&LINK=1";.
Going to "banner" would escape "mow/bla" with no_parent on.
Decided NOT to load it.
Deciding whether to enqueue "http://www.yatho.com/banner/images/b-lion.gif";.
This is not the same hostname as the parent's (www.yatho.com and www.tigress.com).
Decided NOT to load it.
Deciding whether to enqueue "http://www.tigress.com/html/setlocation?dummy";.
Going to "html" would escape "mow/bla" with no_parent on.
Decided NOT to load it.

FINISHED --23:10:36--
Downloaded: 2,241 bytes in 1 files
-----------------------------------------------------------------------------

MfG MOW []-)

Reply via email to