Re: ** Nigerian Scam variation (Re: Co-operation Needed!)
On 16/07/2002 16:36:15 Fernando Cassia wrote: >FYI and if someone has been living in a bottle This is a variation of the >Nigerian scam. > >http://www.secretservice.gov/alert419.shtml >http://www.fdic.gov/consumers/consumer/news/cnwin0102/TooGood.html > >Don't even bother contacting them. > >Regards >Fernando > >"Jesse Ndoro." wrote: > >> Dear Sir, [snip "Nigerian" scam quoted in its entirety] Please don't do that. 1) This is a mailing list where the subscribes can actually think 2) FYI, you should not top-post and quote the original *in its entirety* This is a mailing list, where discussions are frequent. Replying at the top makes it difficult to follow who said what and in reply to whom. 3) We already received two copies of the scam. There was no need to send a third, unabridged copy. -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: user-agent string for IE
On 20/06/2002 10:03:13 jgrosman wrote: >Hi all. > [snip question about emulating IE in the User-agent string] Virtually all browsers start their User-Agent with "Mozilla" For IE 6, try something like "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: speed units
On 10/06/2002 23:07:47 Joonas Kortesalmi wrote: >Wget seems top repots speeds with wrong units. It uses for example "KB/s" >rather than "kB/s" which would be correct. Any possibility to fix that? :) > >K = Kelvin >k = Kilo > >Propably you want to use small k with download speeds, right? > Let's not go there again, lest wget will have to report download in kibibytes (ISTR wget using 1024 to divide). k = kilo is reserved for dividing by 1000. -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: Can't get remote files - what am I doing wrong?
On 05/06/2002 13:08:05 drt - lists wrote: >Thank for no help. > >If this is typical of how you reply to your customers I do *not* reply to customers. I am a developer, and post here as a private individual. Perhaps I should unsubscribe altogether. [snip] >>> The Mac machine I am using for testing is behind our firewall, but >>> there is a "hole" opened to allow my internal IP to reach >>> the specific remote IP. >> [snip] >> >> Because you didn't include the output with the -d switch, I'm guessing. >> Do you use a proxy to go through the firewall ? A lot of proxies issue >> HTTP requests even for FTP. HTTP cannot glob. > >Yes we do, So there is a proxy after all. >and no, it doesn't issue an ftp request as I have an opening for >this specific request - which if you had bothered to read my message >instead of trying to attack you would know that. > > >Here is the part that you ignored which addresses the accusation above. ^^ Huh ? I described a scenario which could have caused the failure you described. I did not *accuse* you of using a proxy ! >--- >The Mac machine I am using for testing is behind our firewall, but there >is a "hole" opened to allow my internal IP to reach the specific remote IP. >And using the first example above it does connect so I know I am getting >through the firewall. >--- > Note that if wget is set up to use the proxy by default (env. var, wgetrc) then it'll use the proxy even if it could connect directly through the hole in the firewall. The first example (which I snipped) did not use globbing. That would succeed regardless of whether wget connected directly or through a HTML-ized proxy. We're not getting any closer to a solution. Please post the output of the failed request (the one that fails) in debugging mode (be careful to obscure any possible passwords). > [ad hominem attack snipped] I apologise. Although I consider what I've written to be valid, the tone was not. I claim temporary loss of diplomatic abilities. -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: Can't get remote files - what am I doing wrong?
On 03/06/2002 14:56:47 dale wrote: [snip] >wget ftp://user:[EMAIL PROTECTED]/folder1/folder2/*s.csv > >I get an error message of "no match" and if I use: > >wget --glob=on ftp://user:[EMAIL PROTECTED]/folder1/folder2/*s.csv > >I also get "no match" > In the future, please post the output with the -d switch added. (did you read the instructions ?) [snip] >The Mac machine I am using for testing is behind our firewall, but there >is >a "hole" opened to allow my internal IP to reach the specific remote IP. [snip] Because you didn't include the output with the -d switch, I'm guessing. Do you use a proxy to go through the firewall ? A lot of proxies issue HTTP requests even for FTP. HTTP cannot glob. > >p.s. The reply-to address has been anti-spammed (I hope anyway), please >post any replies to the list. > Somebody at Ultimate Search (the owner of nospam.net) will be mightily surprised. What you did can be interpreted as email address forgery. Please in the future use addresses which end in .invalid (this top level domain is guaranteed to always be, err, invalid), e.g. [EMAIL PROTECTED] -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: ? gets translated to @
On 24/05/2002 13:39:29 ladislav.gaspar wrote: >Hi > >I do the following: >wget http://killefiz.de/zaurus/showdetail.php?app=221 > >but the file is saved as http://killefiz.de/zaurus/showdetail.php@app=221 > >(*.php?app gets translated to *.php@app) > >Why is that and is there a workaround? > That *is* the workaround :-) '?' is an invalid character for filenames on FAT, FAT32, NTFS. Instead of giving an error message like this: "Cannot open killefiz.de/zaurus/showdetail.php?app=221" wget actually tries to do what you want (i.e. download the file). You can run wget on another platform (Linux, some Unix. etc). The filesystems there usually don't have this restriction. -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
JavaScript
Links, the (formerly) text-mode browser has recently acquired the ability to parse JavaScript. Look at http://atrey.karlin.mff.cuni.cz/~clock/twibright/links/ It seems to use around four source files (two generated by lex and yacc, respectively). This might be usable to teach wget JavaScript. Alas, all comments are in Czech :-( -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: crawling servlet based urls
On 16/05/2002 17:06:31 "Steve Mestdagh" wrote: >Hi, >I'm trying to get crawl intranet urls of form: [snip, wget will try to save to filename like this:] > `WKCCommand?command=getLesson&LessonId=137' [snip] The filename above is invalid on many filesystems used by Micros~1. (It's the '?' causing the problem). This is corrected for sure in a newer version, either 1.8.1 or the current CVS. Heiko Herold provides: New CVS binary for windows at http://space.tin.it/computer/hherold -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: cookie pb: download one file on member area
On 15/05/2002 13:34:29 "[EMAIL PROTECTED]" wrote: [snip problem possibly related to cookies] > >Although i use wget with the option for using the mozilla's >cookies file, i am not able to download that file. Could >someone help me ??? If you want further information, just ask. Without the output of "wget -d" our guess is actually worse than yours. Please run wget with the -d option in addition to the existing ones. Then post the results (if it's big it might be a better idea to post it on a website and send just the link to the list). If you send to the list, I'd prefer it pasted into the mail (in-line, rather than an attachment). -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re:
On 30/04/2002 16:31:17 "Tony Lewis" wrote: >[EMAIL PROTECTED] wrote: > >> I want to get page http://www.boards.spb.ru/?3~sell with _all_ contents >> as in browser. But i get only part of web page. >> Page contains >> >> that output data into page. Which suggests that the include didn't work. Probably because the # is missing before the include. See below. > >When I view the source of that page in my browser, I also see the include >tag. For what its worth, this almost looks like an Apache server side >include command, but if it were, it would be: > > > It's the server's job to replace with the output of temp.pl If you can see the (an HTML comment), it means the server didn't do its job (maybe because of the typo) -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: wget wild card
On 25/04/2002 21:55:26 "Zhao, David [PRDUS Non J&J]" wrote: >When I do: >ftp://ftp.something.com/*, I've got "wget: No match". >Any clue? >Thanks in advance > It is likely that there are no files in the FTP root, only directories. Run wget with wget -d -nr ftp://ftp.something.com/* for more information -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: Worm Klez.E immunity
On 25/04/2002 06:43:10 "Tony Lewis" wrote: >Admin wrote: > >> Klez.E is the most common world-wide spreading worm. > >It's definitely a nasty little piece of code. Even though I had Outlook >Express configured to disallow practially everything, the E-mail messages >opened themselves and let the rogue code loose. If you've managed to avoid >Klez so far and you're running Windows, I strongly recommend you read more >about this at Microsoft and then install the recommended security patch: > http://www.pmail.com/ :-) -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: apache irritations
On 22/04/2002 16:38:15 "Maciej W. Rozycki" wrote: >On Mon, 22 Apr 2002, Hrvoje Niksic wrote: > >> > How about using the "-R" option of wget? A brief test proves "-R >> > '*\?[A-Z]=[A-Z]'" works as it should. >> >> Or maybe the default system wgetrc should ship with something like: >> >> reject = *?[A-Z]=[A-Z] > >Note the difference between strings! -- the backslash before the >quotation mark is essential as otherwise it's a glob character. > [A-Z] is a bit extreme, IMHO. How about reject = *\?[NMSD]=[AD] ^^ literal '?' needed here > >Well, I don't think it's sane but adding a *commented-out* reject line >with an appropriate annotation to the default system wgetrc looks like a >good idea to me. > A good idea. -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: HTTP 1.1
On 12/04/2002 21:37:31 hniksic wrote: >"Tony Lewis" <[EMAIL PROTECTED]> writes: > >> Hrvoje Niksic wrote: >> >>> > Is there any way to make Wget use HTTP/1.1 ? >>> >>> Unfortunately, no. >> >> In looking at the debug output, it appears to me that wget is really >> sending HTTP/1.1 headers, but claiming that they are HTTP/1.0 >> headers. For example, the Host header was not defined in RFC 1945, >> but wget is sending it. > >Yes. That is by design -- HTTP was meant to be extended in that way. >Wget is also requesting and accepting `Keep-Alive', using `Range', and >so on. > >Csaba Raduly's patch would break Wget because it doesn't suppose the >"chunked" transfer-encoding. Also, its understanding of persistent >connection might not be compliant with HTTP/1.1. IT WAS A JOKE ! Serves me right. I need to put bigger smilies :-( -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: Goodbye and good riddance
On 12/04/2002 19:21:41 "James C. McMaster (Jim)" wrote: >My patience has reached an end. Perhaps, now that you have (for the first >time) indicated you will do something to fix the problem, the possible light >at the end of the tunnel will convince others to stay. The light at the end of the tunnel is just the explosion around the Pu239 : -) -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: HTTP 1.1
On 11/04/2002 18:26:15 hniksic wrote: >"Boaz Yahav" <[EMAIL PROTECTED]> writes: > >> Is there any way to make Wget use HTTP/1.1 ? > >Unfortunately, no. Sure it can be made to use HTTP 1.1 --- http.c.orig Wed Jan 30 14:10:42 2002 +++ http.c Fri Apr 12 11:56:22 2002 @@ -838,7 +838,7 @@ + 64); /* Construct the request. */ sprintf (request, "\ -%s %s HTTP/1.0\r\n\ +%s %s HTTP/1.1\r\n\ User-Agent: %s\r\n\ Host: %s%s%s%s\r\n\ Accept: %s\r\n\ :-) -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: qestio
On 05/04/2002 12:44:22 Varga Gabor wrote: >Hi > >I am gabor from hungary I have a qestion >I have an URL ending like this */show.php?id=843 >I know how it works(correct me if I am wrong) the *.php >(gets or posts) the arg. ID >and the server returns the page 843 but why can't wget >mirror these pages ? > Because it'll try to save with the filename "show.php?id=843", and '?' is invalid in a filename on DOS/Windows/OS2 What version of wget are you using ? What platform (operating system) ? What does the debug log say ? (run wget with the -d switch added) CC'd to wget, not bug-wget -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
wget parsing JavaScript
wget stumbled upon the following HTML file: --- >8 foo var sitems=new Array() var sitemlinks=new Array() ///Edit below/ //extend or shorten this list sitems[0]="15.html" sitems[1]="16.html" sitems[2]="17.html" sitems[3]="18.html" sitems[4]="19.html" sitems[5]="20.html" sitems[6]="21.html" sitems[7]="22.html" sitems[8]="23.html" sitems[9]="24.html" sitems[10]="25.html" sitems[11]="26.html" sitems[12]="27.html" //These are the links pertaining to the above text. sitemlinks[0]="31.html" sitemlinks[1]="32.html" sitemlinks[2]="33.html" sitemlinks[3]="34.html" sitemlinks[4]="35.html" sitemlinks[5]="36.html" sitemlinks[6]="37.html" sitemlinks[7]="38.html" sitemlinks[8]="39.html" sitemlinks[9]="40.html" sitemlinks[10]="41.html" sitemlinks[11]="42.html" sitemlinks[12]="43.html" //If you want the links to load in another frame/window, specify name of //target (ie: target="_new") var target="" for (i=0;i<=sitems.length-1;i++) document.write(''+sitems[i]+'
') Congratulations, you have turned off JavaScript. --- >8 I see that wget handles
Re: OK, time to moderate this list
On 22/03/2002 07:06:13 Daniel Stenberg wrote: >On Fri, 22 Mar 2002, Hrvoje Niksic wrote: [snip] >> I think I agree with this. The amount of spam is staggering. I have no >> explanation as to why this happens on this list, and not on other lists >> which are *also* open to non-subscribers. > >Spammers work in mysterious ways. ;-) > No, they work in fairly predictable ways. The wget mailinglist address is advertised on the wget homepage. According to empirical observations, if you publish a brand new email address on a web page, it'll receive spam within eight *hours* of it being published. -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: KB or kB
On 08/02/2002 13:58:55 Andre Majorel wrote: >On 2002-02-08 08:54 +0100, Hrvoje Niksic wrote: > >> Wget currently uses "KB" as abbreviation for "kilobyte". In a Debian >> bug report someone suggested that "kB" should be used because it is >> "more correct". The reporter however failed to cite the reference for >> this, and a search of the web has proven inconclusive. >> >> Does someone understand the spelling issues involved enough to point >> out the "correct" spelling and back it up with arguments? > >The applicable standard is the SI (Système International) [snip SI prefixes] >Capital K is not a prefix, it's the SI abbreviation for the >temperature unit, the kelvin (note : lower case k) named after >Lord Kelvin. > >So it's definitely kB for kilobyte. As long as it means 1000 and NOT 1024 > >Whether that means 1000 bytes or 1024 bytes is another issue. Not while claiming to conform to SI. Csaba -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: wget not working
On 08/02/2002 15:34:53 Martin Schöneberger wrote: >At 14:37 08.02.2002 +, Henderson, Daniel wrote: >>#wget www.sophos.com/downloads/ide/ides.zip >>--14:32:57-- http://www.sophos.com/downloads/ide/ides.zip >>=> `ides.zip' >>Connecting to www.sophos.com:80... >>www.sophos.com: Host not found. >> >>Is there something else I should configure in Solaris to allow this to >>work? > >First of all you should find out why you can't connect to "sophos.com". >1) sophos is down -> try later >Solution: get the file from another server >2) dns lookup failed -> try if you can connect to other hosts like >"google.com" or anything else, or if you can only connect to ip adresses. >Solution1: try another DNS server >Solution2: reconfigure your DNS settings or even your DNS-server (if you >are running one) Try nslookup www.sophos.com ping www.sophos.com telnet www.sophos.com 80 If these work, it's wget's fault. If they don't, it's a connectivity problem. >4) user root not allowed to connect to the internet (standard on BSD if i >remember correctly) -> try if you can DL the file using another user >Solution: change the user database or the firewall settings, or just don't >connect to the internet using root :-) Good point. Look at the prompt... [snip] > >Last but not least: Try the "-d" switch with wget and have a look at the >debug output of wget. Perhaps you find further information why you can't >connect. If you don't, send it to this list, perhaps "we" find smth :-) > Very good advice indeed. HTH, -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: KB or kB
On 08/02/2002 08:30:59 Henrik van Ginhoven wrote: >On Fri, Feb 08, 2002 at 08:54:06AM +0100, Hrvoje Niksic wrote: >> Wget currently uses "KB" as abbreviation for "kilobyte". In a Debian >> bug report someone suggested that "kB" should be used because it is >> "more correct". This is the kind of stuff that leads to month-long flamewars :-) > >"kB" rather than "KB"? I think whoever filed that bugreport got it wrong, as >far as I know "kB" would always mean 1000 (bytes), since "k" = thousand, and >never ever 1024. If he'd said "KiB" I'd agree with him to a certain degree, >but "kB" simply can't be right. Note that we can claim the distinction that k=1000 and K=1024 That won't work with 1E6 vs 2**20 because SI uses uppercase M for 1E6. > >Rather than me trying to sum it up and risk typing something wrong, this >page seems to address the issue well: > >http://www.romulus2.com/articles/guides/misc/bitsbytes.shtml > Please, no kibibytes :-) Maybe wget should just count 512-byte "blocks", a la df. That would improve the understandability of the display ... NOT But it would keep the terminally anal-retentives at bay :-) Seriously, just ignore it. I can certainly live with 5% "experimental error" ( 2**20 = 1.0486E6 ) at megabyte level. -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: BUG https + index.html
On 01/02/2002 12:10:59 "Mr.Fritz" wrote: >After the https/robots.txt bug, doing a recursive wget to an https-only server >gives me this error: it searches for http://servername/index.html but there >is no server on port 80, so wget receives a Connection refused error and >quits. It should search for https://servername/index.html > Are you sure this was an SSL-enabled wget ? Please provide a debug log by running wget with the -d parameter. -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: mirroring vs -m
On 29/01/2002 15:54:17 Andre Majorel wrote: [snip debate about following links in HTML retrieved by FTP] > >I'm inclined to think that recursive retrieval without parsing >is a feature. HTML content is normally served over HTTP. If you >want to retrieve HTML through FTP, it's likely because you do >*not* want to follow the links. > I (client) don't get the choice. If the document at http://foo.bar/index.html has all its links like this: ftp://foo.bar/welcome.html";>welcome the client has no choice but to retrieve them via FTP. It would be nice if wget was able to follow all those links. >If Wget always parsed HTML, even over FTP, it would be >impossible to make a complete mirror a tree that has broken href >links or hidden files. Perhaps "If wget started with FTP, it should mirror FTP-like (.listing and all that). If it started via HTTP, it should follow links, regardless of future retrieval modes" [snip] -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
RE: Bug report: 1) Small error 2) Improvement to Manual
On 17/01/2002 07:34:05 Herold Heiko wrote: [proper order restored] >> -Original Message- >> From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]] >> Sent: Thursday, January 17, 2002 2:15 AM >> To: Michael Jennings >> Cc: [EMAIL PROTECTED] >> Subject: Re: Bug report: 1) Small error 2) Improvement to Manual >> >> >> Michael Jennings <[EMAIL PROTECTED]> writes: >> >> > 1) There is a very small bug in WGet version 1.8.1. The bug occurs >> >when a .wgetrc file is edited using an MS-DOS text editor: >> > >> > WGet returns an error message when the .wgetrc file is terminated >> > with an MS-DOS end-of-file mark (Control-Z). MS-DOS is the >> > command-line language for all versions of Windows, so ignoring the >> > end-of-file mark would make sense. >> >> Ouch, I never thought of that. Wget opens files in binary mode and >> handles the line termination manually -- but I never thought to handle >> ^Z. >> >> As much as I'd like to be helpful, I must admit I'm loath to encumber >> the code with support for this particular thing. I have never seen it >> before; is it only an artifact of DOS editors, or is it used on >> Windows too? >> [snip "copy con file.txt"] > >However in this case (at least when I just tried) the file won't contain >the ^Z. OTOH some DOS programs still will work on NT4, NT2k and XP, and >could be used, and would create files ending with ^Z. But do they really >belong here and should wget be bothered ? > >What we really need to know is: > >Is ^Z still a valid, recognized character indicating end-of-file (for >textmode files) for command shell programs on windows NT 4/2k/Xp ? >Somebody with access to the *windows standards* could shed more light on >this question ? > >My personal idea is: >As a matter of fact no *windows* text editor I know of, even the >supplied windows ones (notepad, wordpad) AFAIK will add the ^Z at the >end of file.txt. Wget is a *windows* program (although running in >console mode), not a *Dos* program (except for the real dos port I know >exists but never tried out). > I don't think there's a distinction between DOS and Windows programs in this regard. The C runtime library is most likely to play a significant role here. For a file fopen-ed in "rt" mode, teh RTL would convert \r\n -> \n and silently eat the _first_ ^Z, returning EOF at that point. When writing, it goes the other way 'round WRT \n->\r\n. I'm unsure about whether it writes ^Z at the end, though. >So personally I'd say it would not be really necessary adding support >for the ^Z, even in the win32 port; except possibly for the Dos port, if >the porter of that beast thinks it would be useful. > Problem could be solved by opening .netrc in "rt" However, the "t" is a non-standard extension. However, this is not wget's problem IMO. Different editors may behave differently. Example: on OS/2 (which isn't a DOS shell, but can run DOS programs), the system editor (e.exe) *does* append a ^Z at the end of every file it saves. People have patched the binary to remove this feature :-) AFAIK no other OS/2 editor does this. -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: A strange bit of HTML
On 16/01/2002 19:31:26 "Ian Abbott" wrote: >I came across this extract from a table on a website: > >href="66B27885.htm" "msover1('Pic1','thumbnails/MO66B27885.jpg');" >onMouseOut="msout1('Pic1','thumbnails/66B27885.jpg');">SRC="thumbnails/66B27885.jpg" NAME="Pic1" BORDER=0 > > >Note the string beginning "msover1(", which seems to be an >attribute value without a name, so that makes it illegal HTML. > That sounds like they wanted onMouseOver="msover1(...)" It's also likely that msover1 is a Javascript function :-( >I haven't traced what Wget is actually doing when it encounters >this, but it doesn't treat "66B27885.htm" as a URL to be >downloaded. > in map_html_tags() /* Establish bounds of attribute name. */ attr_name_begin = p; /* */ /* ^*/ while (NAME_CHAR_P (*p)) ADVANCE (p); attr_name_end = p; /* */ /* ^ */ if (attr_name_begin == attr_name_end) goto backout_tag; When it sees "msover1(..." it doesn't ADVANCE (because NAME_CHAR_P(") is false). Hence attr_name_begin == attr_name_end, and it backs out: backout_tag: #ifdef STANDALONE ++tag_backout_count; #endif /* The tag wasn't really a tag. Treat its contents as ordinary data characters. */ >I can't call this a bug, but is Wget doing the right thing by >ignoring the href altogether? > Until there's an ESP package that can guess what the author intended, I doubt wget has any choice but to ignore the defective tag. In addition, wget should send an email to webmaster@, complaining about the invalid HTML :-) -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: Is "wget --timestamping URL" working on Windows 2000?
>From main.c: /* Open the output filename if necessary. */ if (opt.output_document) { if (HYPHENP (opt.output_document)) opt.dfp = stdout; else { struct stat st; opt.dfp = fopen (opt.output_document, opt.always_rest ? "ab" : "wb"); if (opt.dfp == NULL) { perror (opt.output_document); exit (1); } if (fstat (fileno (opt.dfp), &st) == 0 && S_ISREG (st.st_mode)) opt.od_known_regular = 1; } } It seems to me that if an output_document is specified, it is being clobbered at the very beginning (unless always_rest is true). Later in http_loop stat() comes up with zero length. Hence there's always a size mismatch when --output-document is specified. That doesn't sound good to me... Csaba -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: log errors
On 11/12/2001 15:09:25 hniksic wrote: >Summer Breeze <[EMAIL PROTECTED]> writes: > >> I want to know if Wget is a program similar to Mozilla, and if so is >> there any way to make my pages available to Wget? I use Netscape to >> create my web pages. > >Wget is a command-line downloading utility; it allows you to download >a page or a part of the site without further user interaction. > >> Here is a sample entry: >> >> 66.28.29.44 - - [08/Dec/2001:18:21:20 -0500] "GET /index4.html%0A >> HTTP/1.0" 403 280 "-" "Wget/1.6" > >"/index4.html%0A" looks like a page is trying to link to /index4.html, >but the link contains a trailing newline. That IP address is assigned to Road Runner (big cable ISP, I think) Is /index4.html%0A the *first* error line in the log from 66...44 ? Wget will try to download a URL in two cases: either because it was told to explicitly, or because it was doing a recursive download and found that link in a page downloaded earlier. /index4.html%0A looks like something somewhere was misparsed. It might conceivably be wget (unlikely, as this sort of problem would've surfaced long ago). If /index4.html%0A *is* the first URL requested by that IP address, then the blame is clearly elsewhere (unless -i was used). If not, can you search your site for a link to /index4.html that might be badly formatted HTML (although wget should be able to defend itself against bad HTML). (Please don't CC me; I'm on the list) -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: Is "wget --timestamping URL" working on Windows 2000?
On 11/12/2001 14:03:54 Adrian Aichner wrote: >Hi Wgeteers! > >Is > -N, --timestamping don't retrieve files if older than local. >supposed to work on windows 2000? > [snip] > >cd c:\Hacking\SunSITE.dk\xemacsweb\Download\win32\ >%TEMP%\wget.wip\src\wget.exe --debug --timestamping --output-document=setup.exe >http://ftp.xemacs.org/windows/setup.exe >Compilation started at Tue Dec 11 14:53:07 2001 +0100 (W. Europe Standard Time) >DEBUG output created by Wget 1.8 on Windows. > >--14:53:07-- http://ftp.xemacs.org/windows/setup.exe > => `setup.exe' >Resolving ftp.xemacs.org... done. >Caching ftp.xemacs.org => 207.96.122.9 >Connecting to ftp.xemacs.org[207.96.122.9]:80... connected. >Created socket 420. >Releasing 007D1C00 (new refcount 1). >---request begin--- [snip HEAD request and response] > > >Found ftp.xemacs.org in host_name_addresses_map (007D1C00) >Registered fd 420 for persistent reuse. >Length: 181,760 [application/octet-stream] >Closing fd 420 >Releasing 007D1C00 (new refcount 1). >Invalidating fd 420 from further reuse. >The sizes do not match (local 0) -- retrieving. ^^^ ^^^ Something is wrong there. Try it without --output-document; it should put it in the current dir anyway >--14:53:08-- http://ftp.xemacs.org/windows/setup.exe > => `setup.exe' >Found ftp.xemacs.org in host_name_addresses_map (007D1C00) >Connecting to ftp.xemacs.org[207.96.122.9]:80... connected. >Created socket 420. >Releasing 007D1C00 (new refcount 1). >---request begin--- >GET /windows/setup.exe HTTP/1.0 [snip] > >14:53:47 (6.14 KB/s) - `setup.exe' saved [181760/181760] > > >Compilation finished at Tue Dec 11 14:53:47 > -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: Uncoupling translations from source
On 10/12/2001 08:10:12 "Martin v. Loewis" wrote: >> Maybe you wanted to say that many Europeans speak English so well, >> that they do not need translations? > >It is my observation as well: Some users are hostile towards the >notion of translated software. Those are typically not native English >speakers, but people who found, at one time or the other, reason to >complain about translations. They do so for all operating systems, >making fun of erroneous translations (such as the infamous "Pfeife >zerbrochen" of SINIX, or translations that an MS employee came up >with). > >From an ancient DR-DOS (version 3.something) Nicht breit __reading__ laufwerk A: This was clearly an oversight (the message was probably pasted together from various places). My native language is Hungarian, and I don't remember using ANY software in Hungarian (with the possible exception of Recognita, which is written by hungarians). For the few I tried, I found the hungarian translation incredibly awkward (this is exacerbated by the fact that Hungarian is neither germanic nor latinic), even if not at the level of "all your base are belong to us" :-) It was easier to use the english version (this was all commercial software). Complaining about the *presence* of translation is silly, IMO. Presumably gettext has a way to decide what language to use (LANG environment variable, or suchlike; LANG=en_gb should do). Decoupling translations is a good idea, if the logistics can be sorted out. Csaba -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: Wget 1.8-beta1 now available
On 01/12/2001 19:44:44 John Poltorak wrote: >On Sat, Dec 01, 2001 at 04:30:47PM +0100, Hrvoje Niksic wrote: >> John Poltorak <[EMAIL PROTECTED]> writes: >> >> > Is it possible to include OBJEXT in Makefile.in to make this more >> > cross-platform? >> >> I suppose so. I mean, o is already defined to .@U@o, but I'm not >> exactly sure what the U is supposed to stand for. > > >It's looks to me as though @U@ is set up for some variable substitution, >but I can't work out what for... Maybe it's getting replaced by NULL. > > I know next to nothing about how Auto* is (supposed to be) working, but I've seen lots of sed commands in If @U@ is doing a variable substitution, then it'll expand to something _before_ o (if @U@ -> bar, then this will result in a dependency involving .baro) (looking through configure) Wget's configure contains this towards the end: s%@U@%$U%g U seems to be related to ansi2knr: if(can use prototypes) U= ANSI2KNR= else U=_ ANSI2KNR=./ansi2knr endif This will result in dependencies written as ._o if ansi2knr was run over the sources. This forces me to conclude that using @U@ _CAN_NOT_ and _WILL_NOT_ change .o to .obj I think .@U@o might need to be replaced with .@U@@objext@ (if there is such a beast, in analogy with @exeext@) Csaba -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: wget1.7.1: Compilation Error (please Cc'ed to me :-)
On 28/11/2001 10:28:44 Daniel Stenberg wrote: >On Wed, 28 Nov 2001, zefiro wrote: > >> ld: Undefined symbol >>_memmove >> >> Do you have any suggestion ? > >SunOS 4 is known to not have memmove. > Isn't configure supposed to notice that ? -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: minor memory leak risk
On 20/11/2001 10:12:05 Daniel Stenberg wrote: >This subject says it all. The leak is minor, the fix could be made >something like this: > >diff -u -r1.21 utils.c >--- utils.c 2001/05/27 19:35:12 1.21 >+++ utils.c 2001/11/20 10:10:17 >@@ -903,7 +903,12 @@ > while (fgets (line + length, bufsize - length, fp)) > { > length += strlen (line + length); >- assert (length > 0); >+ if (0 == length) >+{ >+ /* bad input file */ >+ xfree(line); >+ return NULL; >+} > if (line[length - 1] == '\n') >break; > /* fgets() guarantees to read the whole line, or to use up the > It's not just a memory leak. Length <= 0 is declared as a "can't happen". If length is zero, wget will suddenly end due to the assert. If a bad input file can lead to length being zero, then using assert is bad on principle. One should never assert external input. -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: wget mirroring busted
On 14/11/2001 16:27:34 jwz wrote: >[EMAIL PROTECTED] wrote: >> >> Can you post the entire debug log (on a web/ftp site, of course, not the >> list). > >Done -- http://www.jwz.org/wget-log.gz > >Does this mean you can't reproduce this when you run wget the same >way I did? > No, I just wanted to take a look at the surrounding lines in the log. >wget -nv -m -nH -np \ > http://www.dnalounge.com/flyers/ > http://www.dnalounge.com/gallery/ > I may try that myself. P.S. Please *don't* CC in the future, I'm on the list. -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: A tricky download
On 12/10/2001 16:49:07 "Edward J. Sabol" wrote: [snip question about downloading a site with Javascript-only links] > >Probably not. If the only links to the other chapters are in JavaScript >commands, then there's no way wget can do it. Wget does not interpret >JavaScript and most likely never will. Implementing it is left as an exercise for the reader. ;-) -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: Recursive retrieval of page-requisites
On 09/10/2001 14:25:57 Andre Pang wrote: >On Tue, Oct 09, 2001 at 03:46:52PM +0300, Mikko Kurki-Suonio wrote: > >> > To me that sounds like a logical combination of -r -np -p? >> > Any correction appreciated. >> >> Doesn't work, apparently because -np overrides -p. >> >> I.e. with -np set, no document outside the selected subtree will be >> loaded, whether it is referred to through regular link-traversal or as a >> page-requisite element. >> >> My guess is that -p adds those links to the list of documents to load, but >> -np later rejects them because they're not within the selected subtree. >> >> What I'd basically like is a setting that loads page-requisites REGARDLESS >> OF ALL OTHER SETTINGS. I.e. you use the myriad of settings to fine tune >> the exact set of pages requested, and then request "all requisites for the >> selected set of pages". > >Try this patch. It should make -p _always_ get pre-requisites, >even if you have -np on (which was the reason why i wrote the >patch). [snip] Actually, case can be made for both ways. Sometimes you might want -p to only get "images" conforming to -np. Perhaps to skip (advertising)banners. (those are usually served by another server, and thus ignored anyway unless --span-hosts). Perhaps make -p override -np, but have an "alternative" -p (e.g. -pnp ) which obeys -np. I didn't see Andre's patch so I cannot comment on it (stripped by my mail system)-: It modifies existing (admittedly confusing) behaviour, my suggestion would permit getting the old behaviour back. Another possibility would be to keep the existing behaviour (i.e. -np overrides -p) and have a "stronger" -p (e.g. -pp ) which ignores -np. Csaba -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: GNU Wget 1.5.3
On 03/09/2001 23:18:14 Tomas Dalebjörk wrote: >Hi, > >I like wget a lot. >But I have found a bug in the program. > >If I want to download cgi-bin data (the output from a program on a >server), it does not work. > >I issued the following command: > >daleto@modesty:~/slask > wget -d -O slask >'http://www.atg.se/StartlistServlet?action=20&race=5&datum=&bankod=S >ä&lopptyp=100&betType=V75' [snip debug output from wget] >Length: unspecified [text/html] > >0K -> . > >Closing fd 4 >00:14:43 (55.31 KB/s) - `slask' saved [9571] > Huh ? Sounds like a success to me. You asked wget to save the page in a file called slask, which it did. What did you expect to happen ? [snip] > >Is there a bug in the software, or just limited. > Every software has bugs, but some have more bugs than others ;-) Wget is quite high quality software. Note that wget 1.5.3 is quite old, the newest version is 1.7 But try to sort out your problem with 1.5.3 first. -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933 Life is complex, with real and imaginary parts. +++ATH0 +++ATZ +++ARGGH
Re: Bus errors and recursion
[about alloca vs malloc] If you allocate with malloc and then accidentally overwrite it, you get a corrupted heap. If you allocate with alloca and then accidentally overwrite it, you get a corrupted stack. Guess which is easier to notice :-) Besides, alloca is a GCC builtin (IIRC), so you you don't have to worry about its implementation (the GCC folks do :). As long as you have the stack to allocate from, it's as transparent as declaring automatic arrays with variable length. e.g. p = alloca( strlen(s) ); is almost the same as char a[ strlen(s) ], p=a; /* this is a legal GCC extension */ -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED] http://www.sophos.com US support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: --spider downloads files
Are you doing --spider recursively ? In that case, wget HAS to download HTML files, otherwise it can't find the links to recurse... (Please DO NOT email me, I'm on the list) -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED] http://www.sophos.com US support: +1 888 SOPHOS 9 UK Support: +44 1235 559933 Tom Gordon <[EMAIL PROTECTED]To: [EMAIL PROTECTED], [EMAIL PROTECTED] m> cc: Subject: --spider downloads files 17/05/01 16:53 When using the --spider option (GNU Wget 1.6), the URL is downloaded. The doc says "it will not download the pages, just check that they are there." Please help, I need this functionality. Thank you, Tom Gordon [EMAIL PROTECTED]
Yet another Makefile.watcom :-)
"An hour of careful debugging can save you five minutes of reading the documentation" (See attached file: Makefile.watcom) This version gets rid of the ugly double list of object files (one for the linker, one for the dependencies ). WLINK expects the object files to be specified like this: wlink FILE 1.obj,2.obj,etc_etc,n.obj NAME program.exe ... ^^ This is the format auto-generated by their IDE, BTW. However, wlink also accepts an alternate way: wlink FILE { 1.obj 2.obj etc_etc n.obj } NAME program.exe ... What's more, this is actually present in the documentation (gasp)! -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED] http://www.sophos.com US support: +1 888 SOPHOS 9 UK Support: +44 1235 559933 =?iso-8859-1?Q?Makefile.watcom?=
Re: WGET for OS/2 and Proxy-Server
Hrvoje Niksic , Thomas Bohn .com> <[EMAIL PROTECTED]> Sent by: cc: [EMAIL PROTECTED]Subject: Re: WGET for OS/2 and Proxy-Server sdigita.de 15/05/01 13:00 > Thomas Bohn <[EMAIL PROTECTED]> writes: > > > Hello, > > > > I tried to use WGET for OS/2 (tested V 1.5.3 and 1.6) with a proxy > > server. Without proxy server all works fine. But with... > > > > In a OS/2 commandline session I type the following commands: > > > > SET HTTP_PROXY=62.52.17.1:80 > > Your proxy setting gets ignored. Try using lower-case `http_proxy'. > It seems to me that getenv has some "issues" on OS/2. Workaround: use .wgetrc commands instead. All environment variale names (i.e. the part before the '=') are uppercase on OS/2 wget uses getenv("http_proxy"); the implementation of getenv seems to be scanning _environ and doing a strncmp (i.e. case-sensitive comparison). If getproxy in url.c is changed to getenv("HTTP_PROXY") then it does pick up the environment setting. Could we postulate that *ALL* environment vars influencing WGET be uppercase ? These are the places where getenv is used (excluding getopt.c) init.c:237: tmp = getenv ("no_proxy"); init.c:259: char *home = getenv ("HOME"); init.c:292: env = getenv ("WGETRC"); url.c:1292:proxy = opt.http_proxy ? opt.http_proxy : getenv ("http_proxy"); url.c:1294:proxy = opt.ftp_proxy ? opt.ftp_proxy : getenv ("ftp_proxy"); url.c:1297:proxy = opt.https_proxy ? opt.https_proxy : getenv ("https_proxy"); -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED] http://www.sophos.com US support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
RE: New and improved Makefile.watcom
Herold Heiko , Wget List evinet.it> <[EMAIL PROTECTED]> cc: 14/05/01 12:05 Subject: RE: New and improved Makefile.watcom > >-Original Message- > >From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]] > >Sent: Monday, May 14, 2001 11:23 AM > >To: Wget List > >Subject: Re: New and improved Makefile.watcom > > > > > >[EMAIL PROTECTED] writes: > > > >> This is a rewrite of Makefile.watcom > > > >Thanks; I've put it in the repository. > > > >> # Copy this file to the ..\src directory (maybe rename to > >Makefile). Also: > >> # copy config.h.ms ..\src\config.h > > > >Maybe we should provide a "win-build" script (or something) that does > >this automatically? > > How about this ? config.h : ..\windows\config.h.ms copy $[@ $^@ (this would be "copy $< $@" for GNU make) Yup, it works (for me ! :-) > > Isn't this what configure.bat is for ? In theory, but... > Default to VC (or use VC if --msvc is given), otherwise if env var > BORPATH is present (or --borland is given) use borland, otherwise error. > I see no Watcom here :-) configure.bat doesn't know about Watcom C Hrvoje also wrote: > > #disabled for faster compiler > > LFLAGS=sys nt op st=32767 op vers=1.7 op map op q op de 'GNU wget 1.7dev' de all > > CFLAGS=/zp4 /d1 /w4 /fpd /5s /fp5 /bm /mf /os /bt=nt [snip] > > # /zp4= pack structure members with this alignment > > # /d1 = line number debug info > > # /w4 = warning level > > # /fpd= ??? no such switch ! > > # /5s = Pentium stack-based calling > > # /fp5= Pentium floating point > > # /bm = build multi-threaded > > # /mf = flat memory model > > # /os = optimize for size > ^^^ > > # /bt = "build target" (nt) > > One thing I don't understand: why do you optimize for size? Doesn't > it almost always make sense to optimize for speed instead?> Because I like small and sleek executables :-) Are there any processor-intensive bits in wget ? Most of the time it'll wait for the "Internet" anyway. BTW, compiling with DEBUG_MALLOC reveals three memory leaks : 0x13830432: mswindows.c:72<- *exec_name = xstrdup (*exec_name); in windows_main_junk 0x13830496: mswindows.c:168 <- wspathsave = (char*) xmalloc (strlen (buffer) + 1); in ws_mypath 0x13830848: utils.c:1525 <- (struct wget_timer *)xmalloc (sizeof (struct wget_timer)); Here's another edition of Makefile.watcom (See attached file: Makefile.watcom) -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED] http://www.sophos.com US support: +1 888 SOPHOS 9 UK Support: +44 1235 559933 =?iso-8859-1?Q?Makefile.watcom?=
New and improved Makefile.watcom
This is a rewrite of Makefile.watcom It is the end of two separate OBJ file lists (one for dependencies, the other for the linker command) which needed to be kept in sync. The explicit dependency list is also gone (Watcom C can pass dependencies to Watcom Make when using .AUTODEPEND) wget/windows/(See attached file: Makefile.watcom) -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED] http://www.sophos.com US support: +1 888 SOPHOS 9 UK Support: +44 1235 559933 =?iso-8859-1?Q?Makefile.watcom?=
Re: windows, continue bug
:-( Apologies for the top-posting. Please don't quote this message )-: Yes, this patch produced correct behaviour (as far as I can tell :-) Downloading a file (with -c) for the first time, regardless of whether the server supported resume, succeeded. Downloading a file (with -c) for the second time: * If the server supports resume, then "File is fully downloaded, nothing to do" * If the server doesn't support resume, then "Refusing to truncate file" Downloading a file (with -c) again, after manually truncating it: * If the server supports resume, then it skips past the downloaded part correctly, and gets the rest. * If the server doesn't support resume, then "Refusing to truncate file" It is up to somebody else to dream up more scenarios. -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED] http://www.sophos.com US support: +1 888 SOPHOS 9 UK Support: +44 1235 559933 Hrvoje Niksic .com> cc: Sent by: Subject: Re: windows, continue bug [EMAIL PROTECTED] sdigita.de 09/05/01 19:26 [EMAIL PROTECTED] writes: > At least the CVS version I downloaded on 9th of May still has the problem: > > wget -c http://some.random.com/ results in > "The file is already fully retrieved, nothing to do." and nothing is > downloaded :-( Ah, I see. This is a different bug from the one Herold was seeing. Thanks for the explanation. Does this patch fix the problem? 2001-05-09 Hrvoje Niksic <[EMAIL PROTECTED]> * http.c (gethttp): Before concluding that the file is already fully retrieved, make sure that the file existed and `Range' was actually requested. Index: src/http.c === RCS file: /pack/anoncvs/wget/src/http.c,v retrieving revision 1.58 diff -u -r1.58 http.c --- src/http.c 2001/05/08 11:47:05 1.58 +++ src/http.c 2001/05/09 18:25:41 @@ -1190,7 +1190,11 @@ if (opt.always_rest) { /* Check for condition #2. */ - if (hs->restval >= contlen) + if (hs->restval > 0/* restart was requested. */ + && contlen != -1 /* we got content-length. */ + && hs->restval >= contlen /* file fully downloaded + or has shrunk. */ + ) { logputs (LOG_VERBOSE, _("\ \nThe file is already fully retrieved; nothing to do.\n\n"));
Re: windows, continue bug
Hrvoje Niksic .com> cc: Sent by: Subject: Re: windows, continue bug [EMAIL PROTECTED] sdigita.de 08/05/01 12:52 > [EMAIL PROTECTED] writes: > > > I don't know about Heiko, but I got the sources from the CVS shortly after > > he posted his "windows, continue bug" message to the list. > > And yet the http.c code you showed looked different from what I > assumed was the latest version. > > > It seems to me that your "fix" doesn't work. > > It wasn't supposed to fix the problem you had; it was a minor > optimization. > > In the meantime I believe I found and fixed the real problem; updating > to the latest CVS sources should fix the problem Heiko was seeing. > At least the CVS version I downloaded on 9th of May still has the problem: wget -c http://some.random.com/ results in "The file is already fully retrieved, nothing to do." and nothing is downloaded :-( I ran it under the debugger, this is what I saw: In gethttp at http.c(1193), where the code is if( hs->restval >= contlen ) { //say fully retrieved and bail with RETRUNNEEDED } hs->restval is 0 and contlen is -1 contlen is -1 because the server didn't bother to send Content-Length :-( (we got here because contrange was -1 [line 1172] and opt.always_rest was 1 [line 1190] ) Regardless what the comments say, wget didn't send any 'Range' request for the server to honor. It seems to me that: IF the server doesn't send content-range, AND opt.always_reset==1 ( wget -c ) AND the server doesn't send Content-Length (so contlen==-1) THEN hs->restval (at least 0) will always be >= contlen (-1), hence gethttp will abort with RETRUNNEEDED (why does it need a retrun ? :-) ENDIF In other words, wget -c ... on a "lazy" (one that doesn't send Content-Length) server will NOT download anything. This is not good, the logic around here is faulty or the values aren't set up correctly. -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED] http://www.sophos.com US support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: windows, continue bug
Hrvoje Niksic .com> cc: Sent by: Subject: Re: windows, continue bug [EMAIL PROTECTED] sdigita.de 07/05/01 20:06 [EMAIL PROTECTED] writes: > http_loop calls gethttp() at line 1539, but the following is only > at line 1554: > > if( opt.always_rest ) > hstat.no_truncate = file_exists_p(locf); > > Moving these two lines *above* the call to gethttp() on line 1554, > the file was downloaded correctly. How are you guys getting this? The latest source from the CVS should look different, and should in fact work. (I've just applied another fix, this time a small optimization.) end quoted I don't know about Heiko, but I got the sources from the CVS shortly after he posted his "windows, continue bug" message to the list. It seems to me that your "fix" doesn't work. Compiling and running (a CVS checkout around 9:30 BST, +0100) on both OS/2 (gcc) and windows (watcom) produced this: ->8 DEBUG output created by Wget 1.7-dev on os2-emx. parseurl ("http://some.random.com/";) -> host some.random.com -> opath -> dir -> file -> ndir newpath: / --11:09:30-- http://some.random.com/ => `index.html' Connecting to some.random.com:80... Caching some.random.com <-> 10.1.1.9 Created fd 3. connected! ---request begin--- GET / HTTP/1.0 User-Agent: Wget/1.7-dev Host: some.random.com Accept: */* Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... HTTP/1.1 200 OK Date: Tue, 08 May 2001 10:09:28 GMT Server: Apache/1.3.14 (Unix) PHP/4.0.4pl1 X-Powered-By: PHP/4.0.4pl1 Connection: close Content-Type: text/html The file is already fully retrieved; nothing to do. Closing fd 3 ->8 Note: the file was *NOT* retrieved before. -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED] http://www.sophos.com US support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: windows, continue bug
You mean this ? --->8--- DEBUG output created by Wget 1.7-dev on Windows. parseurl ("http://turtle.power.org/";) -> host turtle.power.org -> opath -> dir -> file -> ndir newpath: / Checking for turtle.power.org in host_name_address_map. Checking for turtle.power.org in host_slave_master_map. First time I hear about turtle.power.org by that name; looking it up. Caching turtle.power.org <-> 10.1.1.9 Checking again for turtle.power.org in host_slave_master_map. --10:35:49-- http://turtle.power.org/ => `turtle.power.org/index.html' Connecting to turtle.power.org:80... Found turtle.power.org in host_name_address_map: 10.1.1.9 Created fd 88. connected! ---request begin--- GET / HTTP/1.0 User-Agent: Wget/1.7-dev Host: turtle.power.org Accept: */* Connection: Keep-Alive HTTP request sent, awaiting response... HTTP/1.1 200 OK Date: Fri, 04 May 2001 09:35:48 GMT Server: Apache/1.3.14 (Unix) PHP/4.0.4pl1 X-Powered-By: PHP/4.0.4pl1 Connection: close Content-Type: text/html The server does not support continued download; refusing to truncate `turtle.power.org/index.html'. FINISHED --10:35:49-- Downloaded: 0 bytes in 0 files --->8--- It's not just on Windows; happens on OS/2 ( compiled with GCC ) too. Debugging it suggests that hstat.no_truncate desn't get initialized (dodgy random-looking value contained in no_truncate) : http_loop calls gethttp() at line 1539, but the following is only at line 1554: if( opt.always_rest ) hstat.no_truncate = file_exists_p(locf); Moving these two lines *above* the call to gethttp() on line 1554, the file was downloaded correctly. -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED] http://www.sophos.com US support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
RE: Fix safe-ctype detection
(Re: defining inline to nothing) That's strange...VC understands inline. Did you try to define it to __inline ? Try substituting ( #defining ) ftruncate to chsize. I had this problem when trying to compile FTE (fte.sourceforge.net) with VisualAge C++ (which also doesn't have ftruncate). -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED] http://www.sophos.com US support: +1 888 SOPHOS 9 UK Support: +44 1235 559933 Herold Heiko evinet.it> cc: "List Wget (E-mail)" <[EMAIL PROTECTED]> Subject: RE: Fix safe-ctype detection 27/04/01 10:13 It does work (I suppose this means no inline optimizations). However then it stops later at linking stage. Either there is no ftruncate (used in http.c, ftp.c) function or my compiler is not yet set up correctly. [snip]
Re: Anon FTP password
> Following the example set by lftp, I'll change Wget to send "-wget@" > as anonymous FTP password, with the option of changing it. That way > we will have a decent default, and enable the users who know what > they're doing to change it to their email address, if they're > oldfashioned, or to something even more anonymizing, like "mozilla@". You mean "-wget@" with no host ? Won't some FTP sites consider that as invalid ? -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED] http://www.sophos.com US support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: Bundling libtool
"Dan Harkless"To: Wget List <[EMAIL PROTECTED]> Subject: Re: Bundling libtool 27/03/01 12:55 [snipped] > Hrvoje Niksic <[EMAIL PROTECTED]> writes: > > Is it the standard configure caching mechanism, the "(cached)" thing? > > I think that can be turned off. > > Per-check, you mean? We wouldn't want to turn it off for the whole > configure run just for the benefit of this check. I looked at the autoconf > documentation but didn't see a way to turn it off for a particular > (predefined) check. Custom checks are not automatically cached, though, so > doing the AC_CHECK_LIB stuff manually would do the trick. One can always manually delete the corresponding line in config.cache -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED] http://www.sophos.com US support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: wget bug - after closing control connection
Which version of wget do you use ? Are you aware that wget 1.6 has been released and 1.7 is in development (and they contain a workaround for the "Lying FTP server syndrome" you are seeing) ? -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED] http://www.sophos.com US support: +1 888 SOPHOS 9UK Support: +44 1235 559933
Re: Wget
I'm confused. I thought 1.5.3 *did* display the dots, but I could be wrong. Please send queries like this to the list ( [EMAIL PROTECTED] ), not to me personally. -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED] http://www.sophos.com US support: +1 888 SOPHOS 9UK Support: +44 1235 559933 :-( sorry for the top-posting )-: [EMAIL PROTECTED] (Timo Maier) To: [EMAIL PROTECTED] cc: 06/03/01 Subject: Re: Wget 10:58 Hi! >The newest wget is 1.6 release and 1.7 developer. I have GNU Wget 1.5.3 which doesn't dsiplay the dots, it lokks like this: >--- Connecting to www.telekom.de:80... connected! HTTP request sent, awaiting response... 206 Partial content Length: 4,509,742 (4,267,794 to go) [application/octet-stream] 3.05Mb (236.28kb) done at 5.19 KB/s. time: 0:09:16 (0:04:05 left) >--- Is it possible to implement this in new versions, too? TAM -- OS/2 Warp4, Ducati 750SS '92 You still have the freedom to learn and say what you wanna say http://tam.belchenstuermer.de
Re: Windows ssl enabled binary
>At http://www.geocities.com/heiko_herold you can find a ssl enabled >windows binary, which however does still need thorough testing (please >feedback on the list). > Me too, except s/windows/os2/g at http://www.geocities.com/csaba_22/ It needs the EMX runtime. "configure --with-ssl" produced a Makefile with LIBS= -lcrypto -lssl -lsocket This caused lots of link errors. Changing it to LIBS= -lssl -lcrypto -lsocket then produced a wget.exe which successfully connected to and downloaded a few files from my Apache+mod_ssl server via https. -- Csaba Ráduly, Programmer - OS/2 Sophos Anti-Virus email: [EMAIL PROTECTED] http://www.sophos.com US support: +1 888 SOPHOS 9UK Support: +44 1235 559933
RE: SUGGESTION: rollback like GetRight
On 10/01/2001 08:50:18 ZIGLIO Frediano wrote: >I suggest two parameter: >- rollback-size >- rollback-check-size >where 0 <= rollback-check-size <= rollback-size >The first for calculate the beginning of range (filesize - rollback-size) >and the second for check (wget should check the range [filesize - >rollback-size,filesize - rollback-size + rollback-check-size) ) > I was thinking of making -c have an optional parameter specifying the rollback. If this was defaulted to 0, it can be given to lseek( , , SEEK_END ) (it would be nice if it could accept a 'k' suffix) The check size then could be specified separately. Csaba -- Csaba Ráduly, Programmer - OS/2 Sophos Anti-Virus email: [EMAIL PROTECTED] http://www.sophos.com US support: +1 888 SOPHOS 9UK Support: +44 1235 559933