Serious bug in recursive retrieval behaviour occured in v. 1.8

2002-04-04 Thread Robert Mücke

Dear wget team,

recently found a bug in the version 1.8 of the wget program (recursive
retrieval) that did not occur in earlier versions (at least as far as
I can see, 1.7 is definitly not affected).

The new wget version treats single ?xxx hrefs the same way as hrefs to
anchors (#xxx). So e.g. an, misplaced, a href=xx/a reference
leads to an http request of http://www.xxx.xxx/currentfile.html;
(in difference to earlier versions that treated the  as a single
file name). Now, while this is not a bad thing, wget 1.8 then starts
to send requests of the form
http://www.xxx.xxx/curentfile.html/anotherfile.html;, although
anotherfile.html is e.g. also in the root dir, or at least
http://www.xxx.xxx//curentfile.html; which can causes wget to send a
retrieval requenst for the file a second time and, if time stamp missing,
to download it twice.

I had experience with a server that did not answer 404, file
not found, on such errorous recursive requests, but sent again the contains
of currentfile.html, but now as another URL in another directory level which
ended up in an infinite request
loop, diving deeper and deeper in directories that actually do not exist on
the server. (I actually got serious problems, unfortunately, the person
affected considers legal steps, because the uncontrolled
wget downloaded the site about 20 times over - till it has been shut down)

So it seems to be important to correct this behaviour. I think you only need
to set up a test site (maybe with some subdirs) containing one file with
an errorous href= tag to reproduce this (maybe only in parts
depending on your server configuration).

Sincerly,
Robert Muecke




WGET Offline proxy question

2002-04-04 Thread Jonathan A Ruxton



Hi All

Sorry for the off topic and 'Newbie' question, but is it possible having
downloaded a site using 'wget' to a local directory, to then at a later 
stage to run just the 'proxy' part of the process, reading the site 
contents from the local directory and note from the site to enable
re-loading of a 'Squid' cache, effectively running 'wget' in an offline
mode with the contents already downloaded. Apologies if this has already
been answered, could someone please point to the solution if this is the
case.

Thanks in advance

 jar...


Please could I be cc'd ([EMAIL PROTECTED]) on any relevant answers as I am 
not currently subscribed to this list.




Re: URI-parsing bug

2002-04-04 Thread Ian Abbott

On 4 Apr 2002 at 5:51, Tristan Horn wrote:

 Just wanted to point out that as of version 1.8.1, wget doesn't correctly
 recognize A HREF=//foo/bar-style links.
 
 tris.net/index.html: merge(http://tris.net/;, //www.arrl.org/) - 
http://tris.net//www.arrl.org/
 
 (it should return http://www.arrl.org/)

There haven't been any releases since 1.8.1, but this bug is fixed
in the current CVS version.




Re: Serious bug in recursive retrieval behaviour occured in v. 1.8

2002-04-04 Thread Ian Abbott

On 4 Apr 2002 at 13:21, Robert Mücke wrote:

 So it seems to be important to correct this behaviour. I think you only need
 to set up a test site (maybe with some subdirs) containing one file with
 an errorous href= tag to reproduce this (maybe only in parts
 depending on your server configuration).

I couldn't reproduce this with wget 1.8 and a local Apache server
(but I didn't attempt to reconfigure Apache in an attempt to
reproduce it).

A few recursive retrieval bugs were fixed in wget 1.8.1. Is it
possible for you to test that version? (You may want to limit the
recursion depth and the maximum amount to download if repeating the
test!)



Amiga release

2002-04-04 Thread GumBoy

Hi! 
 
What is the version number of the latest Amiga WGet release? 

Greetz!
-- 
sb
sbMarcin GumBoy Graziowski
sbPrivate Amiga Portal redaction member
sbhttp://www.amiga.prv.pl http://www.ppa.ltd.pl
sb
sbTranceLetters member   HTTP://www.tranceletters.w.pl/
sbEmulki - Emulation World of AmigaHTTP://www.emulki.ltd.pl/
sb[EMAIL PROTECTED]
sb


sbThose below are just the server's advertisments, not mine :)


---
Telewizja Puls nadal bedzie istniec...  http://link.interia.pl/f1594 





webTrax.TV! -SUPERIOR ecommerce and web analysis tool!

2002-04-04 Thread webTrax.TV
Title: netOffice.TV!





  
  

  


  
  
webTrax.TV - is an all inclusive, integrated, easy to use web and 
  ecommerce tracking and analysis system!CLICK HERE to see 
  webTrax.TV!Start making 
  Money by getting your ecommerce and web 
  promotions on Track  webTrax.TV!Beat the 
  Competition using the most innovative tool for tracking 
  thesuccess of your website 
  promotions.Save Time by 
  cutting through all the unnecessary information generated 
  bystandard ecommerce and web logs.Feel comfortable 
  that youre utilizing the same proven 
  technologyGlobal 1000 Executives use to manage 
  corporate performance!Save money! 
  Weve bundled this special application for you, and priced 
  itat a fraction of the cost.Control the 
  important information you track using webTrax.TV.Be a Genius! 
  with your boss using webTrax.TVs powerful dynamic charting 
  andgraphing capability.Have Fun while 
  Slicing, and dicing your key performance indicators using 
  ouranimated SuperGUI.Be confident 
  having crucial knowledge about your prospects ecommerce and website 
  activities.webTrax.TV 
  is the easiest tracking tool to 
  implement by simply adding one 
  lineofhtml to 
  anyemail or web page. Start tracking activity in less than 30 
  minutes!The software includes everything you 
  need:Application Server (for logging your hits) SQLServer Compatible Database Web based SuperGUI for navigation, analysis and 
  presentation Prepackaged performance views WebTrax administrator Examples of advanced tracking techniquesCLICK HERE to see 
  webTrax.TV now!

  
  

  
website:www.webTrax.TV 

  

  
__If you would prefer not to 
receive future email, please click 
here.


forcing file overwrite

2002-04-04 Thread Matthew Boedicker

Hello,

I'd like to force wget to overwrite files it retrieves, even if a file
exists with the same name.  I have tried the -nc option, but that causes it
to do nothing if the file is already there.

I am trying to wget Apache log files (via ftp) and since the new file will
always contain at least the old, I want it to overwrite the file each time.

Is there any way to do this?  If there isn't, may I suggest it as a new
option?

Please cc any replies to me, as I am not subscribed to this list.

Thanks.

-- 
36 .oO( 26 )
Matthew Boedicker http://mboedick.org