Serious bug in recursive retrieval behaviour occured in v. 1.8
Dear wget team, recently found a bug in the version 1.8 of the wget program (recursive retrieval) that did not occur in earlier versions (at least as far as I can see, 1.7 is definitly not affected). The new wget version treats single ?xxx hrefs the same way as hrefs to anchors (#xxx). So e.g. an, misplaced, a href=xx/a reference leads to an http request of http://www.xxx.xxx/currentfile.html; (in difference to earlier versions that treated the as a single file name). Now, while this is not a bad thing, wget 1.8 then starts to send requests of the form http://www.xxx.xxx/curentfile.html/anotherfile.html;, although anotherfile.html is e.g. also in the root dir, or at least http://www.xxx.xxx//curentfile.html; which can causes wget to send a retrieval requenst for the file a second time and, if time stamp missing, to download it twice. I had experience with a server that did not answer 404, file not found, on such errorous recursive requests, but sent again the contains of currentfile.html, but now as another URL in another directory level which ended up in an infinite request loop, diving deeper and deeper in directories that actually do not exist on the server. (I actually got serious problems, unfortunately, the person affected considers legal steps, because the uncontrolled wget downloaded the site about 20 times over - till it has been shut down) So it seems to be important to correct this behaviour. I think you only need to set up a test site (maybe with some subdirs) containing one file with an errorous href= tag to reproduce this (maybe only in parts depending on your server configuration). Sincerly, Robert Muecke
WGET Offline proxy question
Hi All Sorry for the off topic and 'Newbie' question, but is it possible having downloaded a site using 'wget' to a local directory, to then at a later stage to run just the 'proxy' part of the process, reading the site contents from the local directory and note from the site to enable re-loading of a 'Squid' cache, effectively running 'wget' in an offline mode with the contents already downloaded. Apologies if this has already been answered, could someone please point to the solution if this is the case. Thanks in advance jar... Please could I be cc'd ([EMAIL PROTECTED]) on any relevant answers as I am not currently subscribed to this list.
Re: URI-parsing bug
On 4 Apr 2002 at 5:51, Tristan Horn wrote: Just wanted to point out that as of version 1.8.1, wget doesn't correctly recognize A HREF=//foo/bar-style links. tris.net/index.html: merge(http://tris.net/;, //www.arrl.org/) - http://tris.net//www.arrl.org/ (it should return http://www.arrl.org/) There haven't been any releases since 1.8.1, but this bug is fixed in the current CVS version.
Re: Serious bug in recursive retrieval behaviour occured in v. 1.8
On 4 Apr 2002 at 13:21, Robert Mücke wrote: So it seems to be important to correct this behaviour. I think you only need to set up a test site (maybe with some subdirs) containing one file with an errorous href= tag to reproduce this (maybe only in parts depending on your server configuration). I couldn't reproduce this with wget 1.8 and a local Apache server (but I didn't attempt to reconfigure Apache in an attempt to reproduce it). A few recursive retrieval bugs were fixed in wget 1.8.1. Is it possible for you to test that version? (You may want to limit the recursion depth and the maximum amount to download if repeating the test!)
Amiga release
Hi! What is the version number of the latest Amiga WGet release? Greetz! -- sb sbMarcin GumBoy Graziowski sbPrivate Amiga Portal redaction member sbhttp://www.amiga.prv.pl http://www.ppa.ltd.pl sb sbTranceLetters member HTTP://www.tranceletters.w.pl/ sbEmulki - Emulation World of AmigaHTTP://www.emulki.ltd.pl/ sb[EMAIL PROTECTED] sb sbThose below are just the server's advertisments, not mine :) --- Telewizja Puls nadal bedzie istniec... http://link.interia.pl/f1594
webTrax.TV! -SUPERIOR ecommerce and web analysis tool!
Title: netOffice.TV! webTrax.TV - is an all inclusive, integrated, easy to use web and ecommerce tracking and analysis system!CLICK HERE to see webTrax.TV!Start making Money by getting your ecommerce and web promotions on Track webTrax.TV!Beat the Competition using the most innovative tool for tracking thesuccess of your website promotions.Save Time by cutting through all the unnecessary information generated bystandard ecommerce and web logs.Feel comfortable that youre utilizing the same proven technologyGlobal 1000 Executives use to manage corporate performance!Save money! Weve bundled this special application for you, and priced itat a fraction of the cost.Control the important information you track using webTrax.TV.Be a Genius! with your boss using webTrax.TVs powerful dynamic charting andgraphing capability.Have Fun while Slicing, and dicing your key performance indicators using ouranimated SuperGUI.Be confident having crucial knowledge about your prospects ecommerce and website activities.webTrax.TV is the easiest tracking tool to implement by simply adding one lineofhtml to anyemail or web page. Start tracking activity in less than 30 minutes!The software includes everything you need:Application Server (for logging your hits) SQLServer Compatible Database Web based SuperGUI for navigation, analysis and presentation Prepackaged performance views WebTrax administrator Examples of advanced tracking techniquesCLICK HERE to see webTrax.TV now! website:www.webTrax.TV __If you would prefer not to receive future email, please click here.
forcing file overwrite
Hello, I'd like to force wget to overwrite files it retrieves, even if a file exists with the same name. I have tried the -nc option, but that causes it to do nothing if the file is already there. I am trying to wget Apache log files (via ftp) and since the new file will always contain at least the old, I want it to overwrite the file each time. Is there any way to do this? If there isn't, may I suggest it as a new option? Please cc any replies to me, as I am not subscribed to this list. Thanks. -- 36 .oO( 26 ) Matthew Boedicker http://mboedick.org