Re: [BUG:#20329] If-Modified-Since support
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 vinothkumar raman wrote: We need to give out the time stamp the local file in the Request header for that we need to pass on the local file's time stamp from http_loop() to get_http() . The only way to pass on this without altering the signature of the function is to add a field to struct url in url.h Could we go for it? That is acceptable. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIvb5B7M8hyUobTrERAv2YAJ0ajYx+pynFLtV2YmEw7fA+vwf8ugCfSaU1 AFkIYSyyyS4egbyXjzBLXBo= =fIT5 -END PGP SIGNATURE-
Re: [bug #20329] Make HTTP timestamping use If-Modified-Since
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Yes, that's what it means. I'm not yet committed to doing this. I'd like to see first how many mainstream servers will respect If-Modified-Since when given as part of an HTTP/1.0 request (in comparison to how they respond when it's part of an HTTP/1.1 request). If common servers ignore it in HTTP/1.0, but not in HTTP/1.1, that'd be an excellent case for holding off until we're doing HTTP/1.1 requests. Also, I don't think removing the previous HEAD request code is entirely accurate: we probably would want to detect when a server is feeding us non-new content in response to If-Modified-Since, and adjust to use the current HEAD method instead as a fallback. - -Micah vinothkumar raman wrote: This mean we should remove the previous HEAD request code and use If-Modified-Since by default and have it to handle all the request and store pages if it is not returning a 304 response Is it so? On Fri, Aug 29, 2008 at 11:06 PM, Micah Cowan [EMAIL PROTECTED] wrote: Follow-up Comment #4, bug #20329 (project wget): verbatim-mode's not all that readable. The gist is, we should go ahead and use If-Modified-Since, perhaps even now before there's true HTTP/1.1 support (provided it works in a reasonable percentage of cases); and just ensure that any Last-Modified header is sane. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIvb7t7M8hyUobTrERAsvQAJ4k7fKrsFtfC4MQtuvE3Ouwz6LseACePqt2 8JiRBKtEhmcK3schVVO347A= =yCJV -END PGP SIGNATURE-
Re: [bug #20329] Make HTTP timestamping use If-Modified-Since
This mean we should remove the previous HEAD request code and use If-Modified-Since by default and have it to handle all the request and store pages if it is not returning a 304 response Is it so? On Fri, Aug 29, 2008 at 11:06 PM, Micah Cowan [EMAIL PROTECTED] wrote: Follow-up Comment #4, bug #20329 (project wget): verbatim-mode's not all that readable. The gist is, we should go ahead and use If-Modified-Since, perhaps even now before there's true HTTP/1.1 support (provided it works in a reasonable percentage of cases); and just ensure that any Last-Modified header is sane. ___ Reply to this item at: http://savannah.gnu.org/bugs/?20329 ___ Message sent via/by Savannah http://savannah.gnu.org/
Re: bug in wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sir Vision wrote: Hello, enterring following command results in an error: --- command start --- c:\Downloads\wget_v1.11.3bwget ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8-l10n/; -P c:\Downloads\ --- command end --- wget cant convert .listing-file into a html-file As this seems to work fine on Unix, for me, I'll have to leave it to the Windows porting guy (hi Chris!) to find out what might be going wrong. ...however, it would really help if you would supply the full output you got, from wget, that leads you to believe Wget couldn't do this conversion. in fact, it wouldn't hurt to supply the -d flag as well, for maximum debugging messages. - -- Cheers, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIVKXx7M8hyUobTrERAo40AKCAmwgDOGgjU2kcTYeEGC3+RkCjzQCeJt6B dz38DW8jMMZtUxc+FhvIhfI= =T+mK -END PGP SIGNATURE-
Re: Bug
ok, thanks for your reply We have a work-around in place now, but it doesnt scale very good. Anyways, I'll start looking for another solution Thanks! Mark On Sat, Mar 1, 2008 at 10:15 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mark Pors wrote: Hi, I posted this bug over two years ago: http://marc.info/?l=wgetm=113252747105716w=4 From the release notes I see that this is still not resolved. Are there any plans to fix this any time soon? I'm not sure that's a bug. It's more of an architectural choice. Wget currently works by downloading a file, then, if it needs to look for links in that file, it will open it and scan through it. Obviously, it can't do that when you use -O -. There are plans to move Wget to a more stream-like process, where it scans links during download. At such time, it's very possible that -p will work the way you want it to. In the meantime, though, it doesn't. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHycd/7M8hyUobTrERAqWKAJ40YUOf5aKxvBwahRWBvqhwvqvq+gCePgI9 u646lF2Qp0abOw3iuvD0ohg= =Cix9 -END PGP SIGNATURE-
Re: bug on wget
Micah Cowan [EMAIL PROTECTED] writes: The new Wget flags empty Set-Cookie as a syntax error (but only displays it in -d mode; possibly a bug). I'm not clear on exactly what's possibly a bug: do you mean the fact that Wget only calls attention to it in -d mode? That's what I meant. I probably agree with that behavior... most people probably aren't interested in being informed that a server breaks RFC 2616 mildly; Generally, if Wget considers a header to be in error (and hence ignores it), the user probably needs to know about that. After all, it could be the symptom of a Wget bug, or of an unimplemented extension the server generates. In both cases I as a user would want to know. Of course, Wget should continue to be lenient towards syntax violations widely recognized by popular browsers. Note that I'm not arguing that Wget should warn in this particular case. It is perfectly fine to not consider an empty `Set-Cookie' to be a syntax error and to simply ignore it (and maybe only print a warning in debug mode).
Re: bug on wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hrvoje Niksic wrote: Generally, if Wget considers a header to be in error (and hence ignores it), the user probably needs to know about that. After all, it could be the symptom of a Wget bug, or of an unimplemented extension the server generates. In both cases I as a user would want to know. Of course, Wget should continue to be lenient towards syntax violations widely recognized by popular browsers. Note that I'm not arguing that Wget should warn in this particular case. It is perfectly fine to not consider an empty `Set-Cookie' to be a syntax error and to simply ignore it (and maybe only print a warning in debug mode). That was my thought. I agree with both of your points above: if Wget's not handling something properly, I want to know about it; but at the same time, silently ignoring (erroneous) empty headers doesn't seem like a problem. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHRGqx7M8hyUobTrERCPwQAJ4wGFwPBqyoVDXjrOifNB/fVF1vtACbBnDU fnSx/Vj+S+DVnfRUbIz5HKU= =n4yr -END PGP SIGNATURE-
Re: bug on wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Diego Campo wrote: Hi, I got a bug on wget when executing: wget -a log -x -O search/search-1.html --verbose --wait 3 --limit-rate=20K --tries=3 http://www.nepremicnine.net/nepremicninske_agencije.html?id_regije=1 Segmentation fault (core dumped) Hi Diego, I was able to reproduce the problem above in the release version of Wget; however, it appears to be working fine in the current development version of Wget, which is expected to release soon as version 1.11.* * Unfortunately, it has been expected to release soon for a few months now; we got hung up with some legal/licensing issues that are yet to be resolved. It will almost certainly be released in the next few weeks, though. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHQypR7M8hyUobTrERCF99AJ4w790h4juXzPwO+csBbSY3KcLOXACdGYgO Kf4Oawgfjx6WOEzYwkQ47mw= =8gL2 -END PGP SIGNATURE-
Re: bug on wget
Micah Cowan [EMAIL PROTECTED] writes: I was able to reproduce the problem above in the release version of Wget; however, it appears to be working fine in the current development version of Wget, which is expected to release soon as version 1.11.* I think the old Wget crashed on empty Set-Cookie headers. That got fixed when I converted the Set-Cookie parser to use extract_param. The new Wget flags empty Set-Cookie as a syntax error (but only displays it in -d mode; possibly a bug).
Re: bug on wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hrvoje Niksic wrote: Micah Cowan [EMAIL PROTECTED] writes: I was able to reproduce the problem above in the release version of Wget; however, it appears to be working fine in the current development version of Wget, which is expected to release soon as version 1.11.* I think the old Wget crashed on empty Set-Cookie headers. That got fixed when I converted the Set-Cookie parser to use extract_param. The new Wget flags empty Set-Cookie as a syntax error (but only displays it in -d mode; possibly a bug). I'm not clear on exactly what's possibly a bug: do you mean the fact that Wget only calls attention to it in -d mode? I probably agree with that behavior... most people probably aren't interested in being informed that a server breaks RFC 2616 mildly; especially if it's not apt to affect the results. Unless of course the user was expecting that the user send a real cookie, but I'm guessing that this only happens when the server doesn't have one to send (or something). But a user in that situation should be using -d (or at least - -S) to find out what the server is sending. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHQ3N97M8hyUobTrERCCpFAJ9RHcdJ8X4UWpEQIhz+khDWc8MOJwCfZANU vr2lCTLP04R/PP/cBf7sIpE= =6csr -END PGP SIGNATURE-
Re: bug in escaped filename calculation?
On 10/4/07, Brian Keck [EMAIL PROTECTED] wrote: I would have sent a fix too, but after finding my way through http.c retr.c I got lost in url.c. You and me both. A lot of the code needs re-written.. there's a lot of spaghetti code in there. I hope Micah chooses to do a complete re-write for version 2 so I can get my hands dirty and understand the code better.
Re: bug in escaped filename calculation?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Josh Williams wrote: On 10/4/07, Brian Keck [EMAIL PROTECTED] wrote: I would have sent a fix too, but after finding my way through http.c retr.c I got lost in url.c. You and me both. A lot of the code needs re-written.. there's a lot of spaghetti code in there. I hope Micah chooses to do a complete re-write for version 2 so I can get my hands dirty and understand the code better. Currently, I'm planning on refactoring what exists, as needed, rather than going for a complete rewrite. This will be driven by unit-tests, to try to ensure that we do not lose functionality along the way. This involves more work overall, but IMO has these key advantages: * as mentioned, it's easier to prevent functionality loss, * we will be able to use the work as its written, instead of waiting many months for everything to be finished (especially with the current number of developers), and * AIUI, the wording of employer copyright assignment releases may not apply to new works that are not _preexisting_ as GPL works. This means that, if a rewrite ended up using no code whatsoever from the original work (not likely, but...), there could be legal issues. After 1.11 is released (or possibly before), one of my top priorities is to clean up the gethttp and http_loop functions to a degree where they can be much more readily read and understood (and modified!). This is important to me because so far (in my probably-not-statistically-significant 3 months as maintainer) a majority of the trickier fixes have been in those two functions. Some of these fixes seem to frequently introduce bugs of their own, and I spend more time than seems right in trying to understand the code there, which is why these particular functions are prime targets for refactoring. :) - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHBR7E7M8hyUobTrERCCrbAJ9Jw7LB/YW4myDOyPiHvXLZ13rkNQCeOVbf 5INV0ApmUTuzxp8zO5haVCA= =EeEd -END PGP SIGNATURE-
Re: bug in escaped filename calculation?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Brian Keck wrote: Hello, I'm wondering if I've found a bug in the excellent wget. I'm not asking for help, because it turned out not to be the reason one of my scripts was failing. The possible bug is in the derivation of the filename from a URL which contains UTF-8. The case is: wget http://en.wikipedia.org/wiki/%C3%87atalh%C3%B6y%C3%BCk Of course these are all ascii characters, but underlying it are 3 nonascii characters, whose UTF-8 encoding is: hexoctal name --- - C387 303 274 C-cedilla C3B6 303 266 o-umlaut C3BC 303 274 u-umlaut The file created has a name that's almost, but not quite, a valid UTF-8 bytestring ... ls *y*k | od -tc 000 303 % 8 7 a t a l h 303 266 y 303 274 k \n Ie the o-umlaut u-umlaut UTF-8 encodings occur in the bytestring, but the UTF-8 encoding of C-cedilla has its 2nd byte replaced by the 3-byte string %87. Using --restrict=nocontrol will do what you want it to, in this instance. I'm guessing this is not intended. Actually, it is (more-or-less). Realize that Wget really has no idea how to tell whether you're trying to give it UTF-8, or one of the ISO latin charsets. It tends to assume the latter. It also, by default, will not create filenames with control characters in them. In ISO latin, characters in the range 0x80-0x9f are control characters, which is why Wget left %87 escaped, which falls into that range, but not the others, which don't. It is actually illegal to specify byte values outside the range of ASCII characters in a URL, but it has long been historical practice to do so anyway. In most cases, the intended meaning was one of the latin character sets (usually latin1), so Wget was right to do as it does, at that time. There is now a standard for representing Unicode values in URLs, whose result is then called IRLs (Internationalized Resource Locators). Conforming correctly to this standard would require that Wget be sensitive to the context and encoding of documents in which it finds URLs; in the case of filenames and command arguments, it would probably also require sensitivity to the current locale as determined by environment variables. Wget is simply not equipped to handle IRLs or encoding issues at the moment, so until it is, a proper fix will not be in place. Addressing these are considered a Wget 2.0 (next-generation Wget functionality) priority, and probably won't be done for a year or two, given that the number of developers involved with Wget, if you add up all the part-time helpers (including me), is probably still less than one full-time dev. :) - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHBSHX7M8hyUobTrERCKRLAJwKiDOo0uO7x/k/iAEB/W0pPQmUJQCfUHaP c6k2490strgy1Efy1DmiOhA= =7lvZ -END PGP SIGNATURE-
Re: bug in escaped filename calculation?
Micah Cowan [EMAIL PROTECTED] writes: It is actually illegal to specify byte values outside the range of ASCII characters in a URL, but it has long been historical practice to do so anyway. In most cases, the intended meaning was one of the latin character sets (usually latin1), so Wget was right to do as it does, at that time. Your explanation is spot-on. I would only add that Wget's interpretation of what is a control character is not so much geared toward Latin 1 as it is geared toward maximum safety. Originally I planned to simply encode *all* file name characters outside the 32-127 range, but in practice it was very annoying (not to mention US-centric) to encode perfectly valid Latin 1/2/3/... as %xx. Since the codes 128-159 *are* control characters (in those charsets) that can mess up your screen and that you wouldn't want seen by default, I decided to encode them by default, but allow for a way to turn it off, in case someone used a different charset. In the long run, supporting something like IRL is surely the right thing to go for, but I have a feeling that we'll be stuck with the current messy URLs for quite some time to come. So Wget simply needs to adapt to the current circumstances. If the locale includes UTF-8 in any shape or form, it is perfectly safe to assume that it's valid to create UTF-8 file names. Of course, we don't know if a particular URL path sequence is really meant to be UTF-8, but there should be no harm in allowing valid UTF-8 sequences to pass through. In other words, the default quote control policy could simply be smarter about what control means. One consequence would be that Wget creates differently-named files in different locales, but it's probably a reasonable price to pay for not breaking an important expectation. Another consequence would be making users open to IDN homograph attacks, but I don't know if that's a problem in the context of creating file names (IDN is normally defined as a misrepresentation of who you communicate with). For those who want to hack on this, the place to look at is url.c:append_uri_pathel; that strangely-named function takes a path element (a directory name or file name component of the URL) and appends it to the file name. It takes care not to ever use .. as a path component and to respect the --restrict-file-names setting as specified by the user. It could be made to recognize UTF-8 character sequences in UTF-8 locales and exempt valid UTF-8 chars from being treated as control characters. Invalid UTF-8 chars would still pass all the checks, and non-canonical UTF-8 sequences would be rejected (by condemning their byte values to being escaped as %..). This is not much work for someone who understands the basics of UTF-8.
Re: bug and patch: blank spaces in filenames causes looping
On Jul 13, 2007, at 12:29 PM, Micah Cowan wrote: sprintf(filecopy, \%.2047s\, file); This fix breaks the FTP protocol, making wget instantly stop working with many conforming servers, but apparently start working with yours; the RFCs are very clear that the file name argument starts right after the string RETR ; the very next character is part of the file name, including if the next character is a space (or a quote). The file name is terminated by the CR LF sequence (which implies that the sequence CR LF may not occcur in the filename). Therefore, if you ask for a file file.txt, a conforming server will attempt to find and deliver a file whose name begins and ends with double-quotes. Therefore, this seems like a server problem. I think you may well be correct. I am now unable to reproduce the problem where the server does not recognize a filename unless I give it quotes. In fact, as you say, the server ONLY recognizes filenames WITHOUT quotes and quoting breaks it. I had to revert to the non- quoted code to get proper behavior. I am very confused now. I apologize profusely for wasting your time. How embarrassing! I'll save this email, and if I see the behavior again, I will provide you with the details you requested below. Could you please provide the following: 1. The version of wget you are running (wget --version) 2. The exact command line you are using to invoke wget 3. The output of that same command line, run with --debug -- Rich wealthychef Cook 925-784-3077 -- it takes many small steps to climb a mountain, but the view gets better all the time.
Re: bug and patch: blank spaces in filenames causes looping
On 7/15/07, Rich Cook [EMAIL PROTECTED] wrote: I think you may well be correct. I am now unable to reproduce the problem where the server does not recognize a filename unless I give it quotes. In fact, as you say, the server ONLY recognizes filenames WITHOUT quotes and quoting breaks it. I had to revert to the non- quoted code to get proper behavior. I am very confused now. I apologize profusely for wasting your time. How embarrassing! I'll save this email, and if I see the behavior again, I will provide you with the details you requested below. I wouldn't say it was a waste of time. Actually, I think it's good for us to know that this problem exists on some servers. We're considering writing a patch to recognise servers that do not support spaces. If the standard method fails, then it will retry as an escaped character. Nothing has been written for this yet, but it has been discussed, and may be implemented in the future.
Re: bug and patch: blank spaces in filenames causes looping
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Rich Cook wrote: On Jul 13, 2007, at 12:29 PM, Micah Cowan wrote: sprintf(filecopy, \%.2047s\, file); This fix breaks the FTP protocol, making wget instantly stop working with many conforming servers, but apparently start working with yours; the RFCs are very clear that the file name argument starts right after the string RETR ; the very next character is part of the file name, including if the next character is a space (or a quote). The file name is terminated by the CR LF sequence (which implies that the sequence CR LF may not occcur in the filename). Therefore, if you ask for a file file.txt, a conforming server will attempt to find and deliver a file whose name begins and ends with double-quotes. Therefore, this seems like a server problem. I think you may well be correct. I am now unable to reproduce the problem where the server does not recognize a filename unless I give it quotes. In fact, as you say, the server ONLY recognizes filenames WITHOUT quotes and quoting breaks it. I had to revert to the non-quoted code to get proper behavior. I am very confused now. I apologize profusely for wasting your time. How embarrassing! No worries, it happens! Sometimes the tests we run go other than we think they did. :) I'll save this email, and if I see the behavior again, I will provide you with the details you requested below. That would be terrific, thanks. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGmpOD7M8hyUobTrERCA7FAJ4oygvX7rpQy1k5FL7j3R12LUdWUACfVHrc sk1tpS12pDYBvVbD4Nv7/I4= =KCxk -END PGP SIGNATURE-
Re: bug and patch: blank spaces in filenames causes looping
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Rich Cook wrote: On OS X, if a filename on the FTP server contains spaces, and the remote copy of the file is newer than the local, then wget gets thrown into a loop of No such file or directory endlessly. I have changed the following in ftp-simple.c, and this fixes the error. Sorry, I don't know how to use the proper patch formatting, but it should be clear. I and another developer could not reproduce this problem, either in the current trunk or in wget 1.10.2. sprintf(filecopy, \%.2047s\, file); This fix breaks the FTP protocol, making wget instantly stop working with many conforming servers, but apparently start working with yours; the RFCs are very clear that the file name argument starts right after the string RETR ; the very next character is part of the file name, including if the next character is a space (or a quote). The file name is terminated by the CR LF sequence (which implies that the sequence CR LF may not occcur in the filename). Therefore, if you ask for a file file.txt, a conforming server will attempt to find and deliver a file whose name begins and ends with double-quotes. Therefore, this seems like a server problem. Could you please provide the following: 1. The version of wget you are running (wget --version) 2. The exact command line you are using to invoke wget 3. The output of that same command line, run with --debug Thank you very much. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGl9KT7M8hyUobTrERCJfoAJ91z9c2GniuoaX0mj9oqzHrrpNCtQCePQnm lvbVe0i5/jVy9V10uQpYgmk= =iQq1 -END PGP SIGNATURE-
Re: [bug #20323] Wget issues HEAD before GET, even when the file doesn't exist locally.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Mauro Tortonesi wrote: Micah Cowan ha scritto: Update of bug #20323 (project wget): Status: Ready For Test = In Progress ___ Follow-up Comment #3: Moving back to In Progress until some questions about the logic are answered: http://addictivecode.org/pipermail/wget-notify/2007-July/75.html http://addictivecode.org/pipermail/wget-notify/2007-July/77.html thanks micah. i have partly misunderstood the logic behind preliminary HEAD request. in my code, HEAD is skipped if -O or --no-content-disposition are given, but if -N is given HEAD is always sent. this is wrong, as HEAD should be skipped even if -N and --no-content-disposition are given (no need to care about the deprecated -N -O combination). can't think of any other case in which HEAD should be skipped, though. Cc'ing wget ML, as it's probably important to open up discussion of the current logic. What about the case when nothing is given on the command line except - --no-content-disposition? What do we need HEAD for then? Also: I don't believe HEAD should be sent if no options are given on the command line. What purpose would that serve? If it's to find a possible Content-Disposition header, we can get that (and more reliably) at GET time (though, I believe we may currently be requiring the file name before we fetch, which if true, should definitely be changed but not for 1.11, in which case the HEAD will be allowed for the time being); and since we're not matching against potential accept/reject lists, we don't really need it. I think it really makes much more sense to enumerate those few cases where we need to issue a HEAD, rather than try to determine all the cases where we don't: if I have to choose a side to err on, I'd rather not send HEAD in a case or two where we needed it, rather than send it in a few where we didn't, as any request-response cycle eats up time. I also believe that the cases where we want a HEAD are/should be fewer than the cases where we don't want them. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGlol+7M8hyUobTrERCOT0AJwNt2dm/80zL7UYbadBaiaPrMvSUQCePKmS WO77ltxl0vr0Pcgd8H1bIY8= =zCTU -END PGP SIGNATURE-
Re: Bug update notifications
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Matthew Woehlke wrote: Micah Cowan wrote: The wget-notify mailing list (http://addictivecode.org/mailman/listinfo/wget-notify) will now also be receiving notifications of bug updates from GNU Savannah, in addition to subversion commits. ...any reason to not CC bug updates here also/instead? That's how e.g. kwrite does thing (also several other lists AFAIK), and seems to make sense. This is 'bug-wget' after all :-). It is; but it's also 'wget'. While I agree that it probably makes sense to send it to a bugs discussion list, this list is a combination bugs/development/support/general discussion list, and I'm not certain it's appropriate to bump up the traffic level for this. Still, if there are enough folks that would like to get these updates (without also seeing commit notifications), perhaps we could craft a second list for this (or, alternatively, split off the main discussion/support list from the bugs list)? - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGkrpK7M8hyUobTrERCIMaAKCDG8JN7DmUK7oIuE0fYmgYnZIrlgCghK7n iV8rIDYe1+cxzrQATM43CEM= =PKqt -END PGP SIGNATURE-
Re: Bug update notifications
Micah Cowan wrote: Matthew Woehlke wrote: Micah Cowan wrote: ...any reason to not CC bug updates here also/instead? That's how e.g. kwrite does thing (also several other lists AFAIK), and seems to make sense. This is 'bug-wget' after all :-). It is; but it's also 'wget'. Hmm, so it is; my bad :-). While I agree that it probably makes sense to send it to a bugs discussion list, this list is a combination bugs/development/support/general discussion list, and I'm not certain it's appropriate to bump up the traffic level for this. Still, if there are enough folks that would like to get these updates (without also seeing commit notifications), perhaps we could craft a second list for this (or, alternatively, split off the main discussion/support list from the bugs list)? I guess a common pattern is: foo-help foo-devel foo-commits ...but of course you're the maintainer, it's your call :-). (The above aren't necessarily actual names of course, just the categories it seems like I'm most used to seeing. e.g. the GNU convention is of course bug-foo, not foo-devel.) -- Matthew This .sig is false
Re: bug and patch: blank spaces in filenames causes looping
From various: [...] char filecopy[2048]; if (file[0] != '') { sprintf(filecopy, \%.2047s\, file); } else { strncpy(filecopy, file, 2047); } [...] It should be: sprintf(filecopy, \%.2045s\, file); [...] I'll admit to being old and grumpy, but am I the only one who shudders when one small code segment contains 2048, 2047, and 2045 as separate, independent literal constants, instead of using a macro, or sizeof, or something which would let the next fellow change one buffer size in one place, instead of hunting all over the code looking for every 20xx which might be related? Just a thought. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: bug and patch: blank spaces in filenames causes looping
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Steven M. Schweda wrote: From various: [...] char filecopy[2048]; if (file[0] != '') { sprintf(filecopy, \%.2047s\, file); } else { strncpy(filecopy, file, 2047); } [...] It should be: sprintf(filecopy, \%.2045s\, file); [...] I'll admit to being old and grumpy, but am I the only one who shudders when one small code segment contains 2048, 2047, and 2045 as separate, independent literal constants, instead of using a macro, or sizeof, or something which would let the next fellow change one buffer size in one place, instead of hunting all over the code looking for every 20xx which might be related? Well, as already mentioned, aprintf() would be much more appropriate, as it elminates the need for constants like these. And yes, magic numbers drive me crazy, too. Of course, when used with printf's 's' specifier, it needs special handling (crafting a STR() macro or somesuch). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjxcX7M8hyUobTrERCHSAAJ9VkQdfhK4/LwByseYH2ZYVzoPqPwCePU1k 2Llybpq/oceXWMyZpBO4bPY= =Vj/R -END PGP SIGNATURE-
RE: bug and patch: blank spaces in filenames causes looping
There is a buffer overflow in the following line of the proposed code: sprintf(filecopy, \%.2047s\, file); It should be: sprintf(filecopy, \%.2045s\, file); in order to leave room for the two quotes. Tony -Original Message- From: Rich Cook [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 04, 2007 10:18 AM To: [EMAIL PROTECTED] Subject: bug and patch: blank spaces in filenames causes looping On OS X, if a filename on the FTP server contains spaces, and the remote copy of the file is newer than the local, then wget gets thrown into a loop of No such file or directory endlessly. I have changed the following in ftp-simple.c, and this fixes the error. Sorry, I don't know how to use the proper patch formatting, but it should be clear. == the beginning of ftp_retr: = /* Sends RETR command to the FTP server. */ uerr_t ftp_retr (int csock, const char *file) { char *request, *respline; int nwritten; uerr_t err; /* Send RETR request. */ request = ftp_request (RETR, file); == becomes: == /* Sends RETR command to the FTP server. */ uerr_t ftp_retr (int csock, const char *file) { char *request, *respline; int nwritten; uerr_t err; char filecopy[2048]; if (file[0] != '') { sprintf(filecopy, \%.2047s\, file); } else { strncpy(filecopy, file, 2047); } /* Send RETR request. */ request = ftp_request (RETR, filecopy); -- Rich wealthychef Cook 925-784-3077 -- it takes many small steps to climb a mountain, but the view gets better all the time.
Re: bug and patch: blank spaces in filenames causes looping
Good point, although it's only a POTENTIAL buffer overflow, and it's limited to 2 bytes, so at least it's not exploitable. :-) On Jul 5, 2007, at 9:05 AM, Tony Lewis wrote: There is a buffer overflow in the following line of the proposed code: sprintf(filecopy, \%.2047s\, file); It should be: sprintf(filecopy, \%.2045s\, file); in order to leave room for the two quotes. Tony -Original Message- From: Rich Cook [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 04, 2007 10:18 AM To: [EMAIL PROTECTED] Subject: bug and patch: blank spaces in filenames causes looping On OS X, if a filename on the FTP server contains spaces, and the remote copy of the file is newer than the local, then wget gets thrown into a loop of No such file or directory endlessly. I have changed the following in ftp-simple.c, and this fixes the error. Sorry, I don't know how to use the proper patch formatting, but it should be clear. == the beginning of ftp_retr: = /* Sends RETR command to the FTP server. */ uerr_t ftp_retr (int csock, const char *file) { char *request, *respline; int nwritten; uerr_t err; /* Send RETR request. */ request = ftp_request (RETR, file); == becomes: == /* Sends RETR command to the FTP server. */ uerr_t ftp_retr (int csock, const char *file) { char *request, *respline; int nwritten; uerr_t err; char filecopy[2048]; if (file[0] != '') { sprintf(filecopy, \%.2047s\, file); } else { strncpy(filecopy, file, 2047); } /* Send RETR request. */ request = ftp_request (RETR, filecopy); -- Rich wealthychef Cook 925-784-3077 -- it takes many small steps to climb a mountain, but the view gets better all the time. -- Rich wealthychef Cook 925-784-3077 -- it takes many small steps to climb a mountain, but the view gets better all the time.
RE: bug and patch: blank spaces in filenames causes looping
-Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Tony Lewis [EMAIL PROTECTED] writes: Wget has an `aprintf' utility function that allocates the result on the heap. Avoids both buffer overruns and arbitrary limits on file name length. If it uses the heap, then doesn't that open a hole where a particularly long file name would overflow the heap? -- URL: http://wiki.tcl.tk/ Even if explicitly stated to the contrary, nothing in this posting should be construed as representing my employer's opinions. URL: mailto:[EMAIL PROTECTED] URL: http://www.purl.org/NET/lvirden/
Re: bug and patch: blank spaces in filenames causes looping
Tony Lewis [EMAIL PROTECTED] writes: There is a buffer overflow in the following line of the proposed code: sprintf(filecopy, \%.2047s\, file); Wget has an `aprintf' utility function that allocates the result on the heap. Avoids both buffer overruns and arbitrary limits on file name length.
Re: bug and patch: blank spaces in filenames causes looping
Rich Cook [EMAIL PROTECTED] writes: Trouble is, it's undocumented as to how to free the resulting string. Do I call free on it? Yes. Freshly allocated with malloc in the function documentation was supposed to indicate how to free the string.
Re: bug and patch: blank spaces in filenames causes looping
Virden, Larry W. [EMAIL PROTECTED] writes: Tony Lewis [EMAIL PROTECTED] writes: Wget has an `aprintf' utility function that allocates the result on the heap. Avoids both buffer overruns and arbitrary limits on file name length. If it uses the heap, then doesn't that open a hole where a particularly long file name would overflow the heap? No, aprintf tries to allocate as much memory as necessary. If the memory is unavailable, malloc returns NULL and Wget exits.
Re: bug and patch: blank spaces in filenames causes looping
Trouble is, it's undocumented as to how to free the resulting string. Do I call free on it? I'd use asprintf, but I'm afraid to suggest that here as it may not be portable. On Jul 5, 2007, at 10:45 AM, Hrvoje Niksic wrote: Tony Lewis [EMAIL PROTECTED] writes: There is a buffer overflow in the following line of the proposed code: sprintf(filecopy, \%.2047s\, file); Wget has an `aprintf' utility function that allocates the result on the heap. Avoids both buffer overruns and arbitrary limits on file name length. -- Rich wealthychef Cook 925-784-3077 -- it takes many small steps to climb a mountain, but the view gets better all the time.
Re: bug and patch: blank spaces in filenames causes looping
On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote: Rich Cook [EMAIL PROTECTED] writes: Trouble is, it's undocumented as to how to free the resulting string. Do I call free on it? Yes. Freshly allocated with malloc in the function documentation was supposed to indicate how to free the string. Oh, I looked in the source and there was this xmalloc thing that didn't show up in my man pages, so I punted. Sorry. -- ✐There's no time to stop for gas, we're already late-- Karin Donker -- Rich wealthychef Cook http://5pmharmony.com 925-784-3077 -- ✐
RE: bug and patch: blank spaces in filenames causes looping
Please remove me from this list. thanks, John Bruso From: Rich Cook [mailto:[EMAIL PROTECTED] Sent: Thu 7/5/2007 12:30 PM To: Hrvoje Niksic Cc: Tony Lewis; [EMAIL PROTECTED] Subject: Re: bug and patch: blank spaces in filenames causes looping On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote: Rich Cook [EMAIL PROTECTED] writes: Trouble is, it's undocumented as to how to free the resulting string. Do I call free on it? Yes. Freshly allocated with malloc in the function documentation was supposed to indicate how to free the string. Oh, I looked in the source and there was this xmalloc thing that didn't show up in my man pages, so I punted. Sorry. -- ?There's no time to stop for gas, we're already late-- Karin Donker -- Rich wealthychef Cook http://5pmharmony.com http://5pmharmony.com/ 925-784-3077 -- ?
Re: bug and patch: blank spaces in filenames causes looping
Rich Cook [EMAIL PROTECTED] writes: On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote: Rich Cook [EMAIL PROTECTED] writes: Trouble is, it's undocumented as to how to free the resulting string. Do I call free on it? Yes. Freshly allocated with malloc in the function documentation was supposed to indicate how to free the string. Oh, I looked in the source and there was this xmalloc thing that didn't show up in my man pages, so I punted. Sorry. No problem. Note that xmalloc isn't entirely specific to Wget, it's a fairly standard GNU name for a malloc-or-die function. Now I remembered that Wget also has xfree, so the above advice is not entirely correct -- you should call xfree instead. However, in the normal case xfree is a simple wrapper around free, so even if you used free, it would have worked just as well. (The point of xfree is that if you compile with DEBUG_MALLOC, you get a version that check for leaks, although it should be removed now that there is valgrind, which does the same job much better. There is also the business of barfing on NULL pointers, which should also be removed.) I'd have implemented a portable asprintf, but I liked the aprintf interface better (I first saw it in libcurl).
Re: bug and patch: blank spaces in filenames causes looping
So forgive me for a newbie-never-even-lurked kind of question: will this fix make it into wget for other users (and for me in the future)? Or do I need to do more to make that happen, or...? Thanks! On Jul 5, 2007, at 12:52 PM, Hrvoje Niksic wrote: Rich Cook [EMAIL PROTECTED] writes: On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote: Rich Cook [EMAIL PROTECTED] writes: Trouble is, it's undocumented as to how to free the resulting string. Do I call free on it? Yes. Freshly allocated with malloc in the function documentation was supposed to indicate how to free the string. Oh, I looked in the source and there was this xmalloc thing that didn't show up in my man pages, so I punted. Sorry. No problem. Note that xmalloc isn't entirely specific to Wget, it's a fairly standard GNU name for a malloc-or-die function. Now I remembered that Wget also has xfree, so the above advice is not entirely correct -- you should call xfree instead. However, in the normal case xfree is a simple wrapper around free, so even if you used free, it would have worked just as well. (The point of xfree is that if you compile with DEBUG_MALLOC, you get a version that check for leaks, although it should be removed now that there is valgrind, which does the same job much better. There is also the business of barfing on NULL pointers, which should also be removed.) I'd have implemented a portable asprintf, but I liked the aprintf interface better (I first saw it in libcurl). -- ✐There's no time to stop for gas, we're already late-- Karin Donker -- Rich wealthychef Cook http://5pmharmony.com 925-784-3077 -- ✐
Re: bug and patch: blank spaces in filenames causes looping
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Rich Cook wrote: So forgive me for a newbie-never-even-lurked kind of question: will this fix make it into wget for other users (and for me in the future)? Or do I need to do more to make that happen, or...? Thanks! Well, I need a chance to look over the patch, run some tests, etc, to see if it really covers everything it should (what about other, non-space characters?). The fix (or one like it) will probably make it into Wget at some point, but I wouldn't expect it to come out in the next release (which, itself, will not be arriving for a couple months); it will probably go into wget 1.12. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjXYj7M8hyUobTrERCI5JAJ0UIDGzQsC8xCI3lK26pzzQ+BkS6ACgj16o oWDlelFyfvvTlhtlDpLYLXM= =DZ8v -END PGP SIGNATURE-
Re: bug and patch: blank spaces in filenames causes looping
Thanks for the follow up. :-) On Jul 5, 2007, at 3:52 PM, Micah Cowan wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Rich Cook wrote: So forgive me for a newbie-never-even-lurked kind of question: will this fix make it into wget for other users (and for me in the future)? Or do I need to do more to make that happen, or...? Thanks! Well, I need a chance to look over the patch, run some tests, etc, to see if it really covers everything it should (what about other, non-space characters?). The fix (or one like it) will probably make it into Wget at some point, but I wouldn't expect it to come out in the next release (which, itself, will not be arriving for a couple months); it will probably go into wget 1.12. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjXYj7M8hyUobTrERCI5JAJ0UIDGzQsC8xCI3lK26pzzQ+BkS6ACgj16o oWDlelFyfvvTlhtlDpLYLXM= =DZ8v -END PGP SIGNATURE- -- ✐There's no time to stop for gas, we're already late-- Karin Donker -- Rich wealthychef Cook http://5pmharmony.com 925-784-3077 -- ✐
Re: bug storing cookies with wget
Mario Ander schrieb: Hi everybody, I think there is a bug storing cookies with wget. See this command line: C:\Programme\wget\wget --user-agent=Opera/8.5 (X11; U; en) --no-check-certificate --keep-session-cookies --save-cookies=cookie.txt --output-document=- --debug --output-file=debug.txt --post-data=name=xxxpassword=dummy=Internetkennwortlogin.x=0login.y=0 https://www.vodafone.de/proxy42/portal/login.po; [..] Set-Cookie: JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE; path=/jsp Set-Cookie: VODAFONELOGIN=1; domain=.vodafone.de; expires=Friday, 01-Jun-2007 15:05:16 GMT; path=/ Set-Cookie: JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE!1180705316338; path=/proxy42 [..] ---response end--- 200 OK Attempt to fake the path: /jsp, /proxy42/portal/login.po So the problem seems to be that wget rejects cookies for paths which don't fit to the request url. Like the script you call is in /proxy42/portal/, which is a subdir of /proxy42 an / so wget accepts those cookies, but wich is not related to /jsp So it seems to be wget sticking to the strict RFC and the script doing wrong. To get this working you would need to patch wget for not RFC-compliant cookies maybe along with an --accept-malformed-cookies directiv. Hope this helps you Matthias
Re: bug storing cookies with wget
Matthias Vill schrieb: Mario Ander schrieb: Hi everybody, I think there is a bug storing cookies with wget. See this command line: C:\Programme\wget\wget --user-agent=Opera/8.5 (X11; U; en) --no-check-certificate --keep-session-cookies --save-cookies=cookie.txt --output-document=- --debug --output-file=debug.txt --post-data=name=xxxpassword=dummy=Internetkennwortlogin.x=0login.y=0 https://www.vodafone.de/proxy42/portal/login.po; [..] Set-Cookie: JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE; path=/jsp Set-Cookie: VODAFONELOGIN=1; domain=.vodafone.de; expires=Friday, 01-Jun-2007 15:05:16 GMT; path=/ Set-Cookie: JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE!1180705316338; path=/proxy42 [..] ---response end--- 200 OK Attempt to fake the path: /jsp, /proxy42/portal/login.po So the problem seems to be that wget rejects cookies for paths which don't fit to the request url. Like the script you call is in /proxy42/portal/, which is a subdir of /proxy42 an / so wget accepts those cookies, but wich is not related to /jsp So it seems to be wget sticking to the strict RFC and the script doing wrong. To get this working you would need to patch wget for not RFC-compliant cookies maybe along with an --accept-malformed-cookies directiv. Hope this helps you Matthias So I thought of a second solution: If you have cygwin (or at least bash+grep) you can run this small script to dublicate and truncate the cookie. --- CUT here --- #!/bin/bash #Author: Matthias Vill; feel free to change and use #get the line for proxy42-path in $temp temp=$(grep proxy42 cookies.txt) #remove everything after last ! temp=${temp%!*} #replace proxy42 by jsp temp=${temp/proxy42/jsp} #append newline to file #echo cookies.txt #add new cookie to cookies.txt echo $tempcookies.txt --- CUT here --- Maybe you need to remove the # in front of echo cookies.txt to compensate a missing trailing newline; otherwise you may end up changing the value of the previous cookie. Maybe this helps even more Matthias
Re: Bug using recursive get and stdout
A quick search at http://www.mail-archive.com/wget@sunsite.dk/; for -O found: http://www.mail-archive.com/wget@sunsite.dk/msg08746.html http://www.mail-archive.com/wget@sunsite.dk/msg08748.html The way -O is implemented, there are all kinds of things which are incompatible with it, -r among them. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Bug in 1.10.2 vs 1.9.1
Juhana Sadeharju wrote: Hello. Wget 1.10.2 has the following bug compared to version 1.9.1. First, the bin/wgetdir is defined as wget -p -E -k --proxy=off -e robots=off --passive-ftp -o zlogwget`date +%Y%m%d%H%M%S` -r -l 0 -np -U Mozilla --tries=50 --waitretry=10 $@ The download command is wgetdir http://udn.epicgames.com Version 1.9.1 result: download ok Version 1.10.2 result: only udn.epicgames.com/Main/WebHome downloaded and other converted urls are of the form http://udn.epicgames.com/../Two/WebHome hi juhana, could you please try the current version of wget from our subversion repository: http://www.gnu.org/software/wget/wgetdev.html#development ? this bug should be fixed in the new code. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: BUG - .listing has sprung into existence
From: Sebastian Doctor, it hurts when I do this. Don't do that. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Bug
Reece ha scritto: Found a bug (sort of). When trying to get all the images in the directory below: http://www.netstate.com/states/maps/images/ It gives 403 Forbidden errors for most of the images even after setting the agent string to firefox's, and setting -e robots=off After a packet capture, it appears that the site will give the forbidden error if the Refferer is not exaclty correct. However, since wget actually uses the domain www.netstate.com:80 instead of without the port, it screws it all up. I've been unable to find any way to tell wget not to insert the port in the requesting url and referrer url. Here is the full command I was using: wget -r -l 1 -H -U Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0) -e robots=off -d -nh http://www.netstate.com/states/maps/images/ hi reece, that's an interesting bug. i've just added it to my THINGS TO FIX list. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: bug/feature request
Hi ! Maybe you can add this patch to your mainline-tree: http://www.mail-archive.com/wget%40sunsite.dk/msg09142.html Best regards Marc Schoechlin On Wed, Jul 26, 2006 at 07:26:45AM +0200, Marc Schoechlin wrote: Date: Wed, 26 Jul 2006 07:26:45 +0200 From: Marc Schoechlin [EMAIL PROTECTED] Subject: bug/feature request To: [EMAIL PROTECTED] Hi, i´m not sure if that is a feature request or a bug. Wget does not collect all page requisites of a given URL. Many sites are referencing components of these sites in cascading style sheets, but wget does not collect these components as page requisites. A example: --- $ wget -q -p -k -nc -x --convert-links \ http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/496901 $ find . -name *.css ./aspn.activestate.com/ASPN/static/aspn.css $ grep url( ./aspn.activestate.com/ASPN/static/aspn.css list-style-image: url(/ASPN/img/dot_A68C53_8x8_.gif); background-image: url(/ASPN/img/ads/ASPN_banner_bg.gif); background-image: url('/ASPN/img/ads/ASPN_komodo_head.gif'); background-image: url('/ASPN/img/ads/ASPN_banner_bottom.gif'); $ find . -name ASPN_banner_bg.gif || echo not found --- A solution for this problem would to parse all collected *.css files for lines which match for url(.*) and to collect these files. Best regards Marc Schoechlin -- I prefer non-proprietary document-exchange. http://sector7g.wurzel6.de/pdfcreator/ http://www.prooo-box.org/ Contact me via jabber: [EMAIL PROTECTED] -- I prefer non-proprietary document-exchange. http://sector7g.wurzel6.de/pdfcreator/ http://www.prooo-box.org/ Contact me via jabber: [EMAIL PROTECTED]
Re: Bug in wget 1.10.2 makefile
Daniel Richard G. ha scritto: Hello, The MAKEDEFS value in the top-level Makefile.in also needs to include DESTDIR='$(DESTDIR)'. fixed, thanks. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: BUG
Tony Lewis ha scritto: Run the command with -d and post the output here. in this case, -S can provide more useful information than -d. be careful to obfuscate passwords, though!!! -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
RE: BUG
Title: RE: BUG Run the command with -d and post the output here. Tony _ From: Junior + Suporte [mailto:[EMAIL PROTECTED]] Sent: Monday, July 03, 2006 2:00 PM To: [EMAIL PROTECTED] Subject: BUG Dear, I using wget to send login request to a site, when wget is saving the cookies, the following error message appear: Error in Set-Cookie, field `Path'Syntax error in Set-Cookie: tu=661541|802400391 @TERRA.COM.BR; Expires=Thu, 14-Oct-2055 20:52:46 GMT; Path= at position 78. Location: http://www.tramauniversitario.com.br/servlet/login.jsp?username=802400 391%40terra.com.brpass=123qwerd=http%3A%2F%2Fwww.tramauniversitario.com.br%2Ft uv2%2Fenquete%2Fcb%2Fsul%2Farte.jsp [following] I trying to access URL http://www.tramauniversitario.com.br/tuv2/participe/login.jsp?rd=http://www.tramauniversitario.com.br/tuv2/enquete/cb/sul/arte.jsp[EMAIL PROTECTED]pass=123qweSubmit.x=6Submit.y=1 In Internet Explorer, this URL work correctly and the cookie is saved in the local machine, but in WGET, this cookie return an error. Thanks, Luiz Carlos Zancanella Junior
RE: Bug in GNU Wget 1.x (Win32)
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Þröstur Sent: Wednesday, June 21, 2006 4:35 PM There have been some reports in the past but I don't think it has been acted upon; one of the problems is that the list of names can be extended at will (beside the standard comx, lptx, con, prn). Maybe it is possible to query the os about the currently active device names and rename the output files if neccessary ? I reproduced the bug with Win32 versions 1.5.dontremeber, 1.10.1 and 1.10.2. I did also test version 1.6 on Linux but it was not affected. That is since the problem is generated by the dos/windows filesystem drivers (or whatever those should be called), basically com1* and so on are equivalent of unix device drivers, with the unfortunate difference of acting in every directory. Example URLs that reproduce the bug : wget g/nul wget http://www.gnu.org/nul wget http://www.gnu.org/nul.html wget -o loop.end http://www.gnu.org/nul.html; I know that the bug is associated with words which are devices in the windows console, but i don't understand why, since I tried to set the output file to something else. I think you meant to use -O, not -o. Doesn't solve the real problem but at least a workaround. Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED] -- +39-041-5907073 / +39-041-5917073 ph -- +39-041-5907472 / +39-041-5917472 fax
Re: BUG: wget with option -O creates empty files even if the remote file does not exist
From: Eduardo M KALINOWSKI wget http://www.somehost.com/nonexistant.html -O localfile.html then file localfile.html will always be created, and will have length of zero even if the remote file does not exist. Because with -O, Wget opens the output file before it does any network activity, and after it's done, it closes the file and leaves it there, regardless of its content (or lack of content). You could avoid -O, and rename the file after the Wget command. You could keep the -O, and check the status of the Wget command (and/or check the output file size), and delete the file if it's no good. (And probably many other things, as well.) If you look through http://www.mail-archive.com/wget@sunsite.dk/;, you can find many people who think that -O should do something else, but (for now) it does what it does. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: bug?
yy :) [EMAIL PROTECTED] writes: I ran wget -P /tmp/.test [1]http://192.168.1.10; in SUSE system (SLES 9) and found that it saved the file in /tmp/_test. This command works fine inRedHat, is it a bug? I believe the bug is introduced by SuSE in an attempt to protect the user. Try reporting it to them.
Re: Bug in ETA code on x64
- Original Message - From: Hrvoje Niksic [EMAIL PROTECTED] Date: Tuesday, March 28, 2006 7:23 pm in progress.c line 880: eta_hrs = (int)(eta / 3600, eta %= 3600); eta_min = (int)(eta / 60, eta %= 60); eta_sec = (int)(eta); This is weird. Did you compile the code yourself, or did you get it Yes that is strange. I got the code from one of the GNU mirrors, but I'm afraid I can't remember which one. from a Windows download site? I'm asking because the code in progress.c doesn't look like that; it in fact looks like this: eta_hrs = eta / 3600, eta %= 3600; eta_min = eta / 60, eta %= 60; eta_sec = eta; The cast to int looks like someone was trying to remove a warning and botched operator precedence in the process. If you must insert the cast, try: eta_hrs = (int) (eta / 3600), eta %= 3600; Yes that also works. The cast is needed on Windows x64 because eta is a wgint (which is 64-bit) but a regular int is 32-bit so otherwise a warning is issued. Oh well. Perhaps it would be better changed to use a semicolon for clarity anyway? cheers,
Re: Bug in ETA code on x64
Thomas Braby [EMAIL PROTECTED] writes: eta_hrs = (int) (eta / 3600), eta %= 3600; Yes that also works. The cast is needed on Windows x64 because eta is a wgint (which is 64-bit) but a regular int is 32-bit so otherwise a warning is issued. The same is the case on 32-bit Windows, and also on Linux. I don't see the value in that warning. Maybe we can disable it with a compiler flag? Oh well. Perhaps it would be better changed to use a semicolon for clarity anyway? Note that, without the cast, both semicolon and comma work equally well.
Re: Bug report
Gary Reysa wrote: Hi, I don't really know if this is a Wget bug, or some problem with my website, but, either way, maybe you can help. I have a web site ( www.BuildItSolar.com ) with perhaps a few hundred pages (260MB of storage total). Someone did a Wget on my site, and managed to log 111,000 hits and 58,000 page views (using more than a GB of bandwidth). I am wondering how this can happen, since the number of page views is about 200 times the number of pages on my site?? Is there something I can do to prevent this? Is there something about the organization of my website that is causing Wget to get stuck in a loop? I've never used Wget, but I am guessing that this guy really did not want 50,000+ pages -- do you provide some way for the user to shut itself down when it reaches some reasonable limit? My website is non-commercial, and provides a lot of information that people find useful in building renewable energy projects. It generates zero income, and I can't really afford to have a lot of people come in and burn up GBs of bandwidth to no useful end. Help! Gary Reysa Bozeman, MT [EMAIL PROTECTED] Hello Gary, From a quick look at your site, it appears to be mainly static html that would not generate a lot of extra crawls. If you have some dynamic portion of your site, like a calendar, that could make wget go into an infinite loop. It would be much easier to tell if you could look at the server logs that show what pages were requested. They would easily tell you want wget was getting hung on. One problem I did notice is that your site is generating soft 404s. In other words, it is sending back a http 200 response when it should be sending back a 404 response. So if wget tries to access http://www.builditsolar.com/blah your web server is telling wget that the page actually exists. This *could* cause more crawls than necessary, but not likely. This problem should be fixed though. It's possible the wget user did not know what they were doing and ran the crawler several times. You could try to block traffic from that particular IP address or create a robots.txt file that tells crawlers to stay away from your site or just certain pages. Wget respects robots.txt. For more info: http://www.robotstxt.org/wc/robots.html Regards, Frank
Re: Bug in ETA code on x64
El 28/03/2006, a las 20:43, Tony Lewis escribió: Hrvoje Niksic wrote: The cast to int looks like someone was trying to remove a warning and botched operator precedence in the process. I can't see any good reason to use , here. Why not write the line as: eta_hrs = eta / 3600; eta %= 3600; Because that's not equivalent. The sequence or comma operator , has two operands: first the left operand is evaluated, then the right. The result has the type and value of the right operand. Note that a command in a list of initializations or arguments is not an operator, but simply a punctuation mark!. Cheers, Greg smime.p7s Description: S/MIME cryptographic signature
Re: Bug in ETA code on x64
Greg Hurrell [EMAIL PROTECTED] writes: El 28/03/2006, a las 20:43, Tony Lewis escribió: Hrvoje Niksic wrote: The cast to int looks like someone was trying to remove a warning and botched operator precedence in the process. I can't see any good reason to use , here. Why not write the line as: eta_hrs = eta / 3600; eta %= 3600; Because that's not equivalent. Well, it should be, because the comma operator has lower precedence than the assignment operator (see http://tinyurl.com/evo5a, http://tinyurl.com/ff4pp and numerous other locations). I'd still like to know where Thomas got his version of progress.c because it seems that the change has introduced the bug.
Re: Bug in ETA code on x64
Thomas Braby [EMAIL PROTECTED] writes: With wget 1.10.2 compiled using Visual Studio 2005 for Windows XP x64 I was getting no ETA until late in the transfer, when I'd get things like: 49:49:49 then 48:48:48 then 47:47:47 etc. So I checked the eta value in seconds and it was correct, so the code in progress.c line 880: eta_hrs = (int)(eta / 3600, eta %= 3600); eta_min = (int)(eta / 60, eta %= 60); eta_sec = (int)(eta); This is weird. Did you compile the code yourself, or did you get it from a Windows download site? I'm asking because the code in progress.c doesn't look like that; it in fact looks like this: eta_hrs = eta / 3600, eta %= 3600; eta_min = eta / 60, eta %= 60; eta_sec = eta; The cast to int looks like someone was trying to remove a warning and botched operator precedence in the process. If you must insert the cast, try: eta_hrs = (int) (eta / 3600), eta %= 3600; ...
RE: Bug in ETA code on x64
Hrvoje Niksic wrote: The cast to int looks like someone was trying to remove a warning and botched operator precedence in the process. I can't see any good reason to use , here. Why not write the line as: eta_hrs = eta / 3600; eta %= 3600; This makes it much less likely that someone will make a coding error while editing that section of code. Tony
Re: Bug in TOLOWER macro when STANDALONE (?)
Beni Serfaty [EMAIL PROTECTED] writes: I Think I found a bug when STANDALONE is defined on hash.c I hope I'm not missing something here... Good catch, thanks. I've applied a slightly different fix, appended below. By the way, are you using hash.c in a project? I'd like to hear if you're satisfied with it and would be very interested in any suggestions and, of course, bugs. hash.c was written to be reuse-friendly. Also note that you can get the latest version of the file (this fix included) from http://svn.dotsrc.org/repo/wget/trunk/src/hash.c . 2006-03-06 Hrvoje Niksic [EMAIL PROTECTED] * hash.c (TOLOWER): Fix definition when STANDALONE. Reported by Beni Serfaty. Index: src/hash.c === --- src/hash.c (revision 2119) +++ src/hash.c (working copy) @@ -53,7 +53,8 @@ # ifndef countof # define countof(x) (sizeof (x) / sizeof ((x)[0])) # endif -# define TOLOWER(x) ('A' = (x) (x) = 'Z' ? (x) - 32 : (x)) +# include ctype.h +# define TOLOWER(x) tolower ((unsigned char) x) # if __STDC_VERSION__ = 199901L # include stdint.h /* for uintptr_t */ # else
Re: bug retrieving embedded images with --page-requisites
Tony Lewis wrote: The --convert-links option changes the website path to a local file system path. That is, it changes the directory, not the file name. Thanks I didn't understand it that way. IMO, your suggestion has merit, but it would require wget to maintain a list of MIME types and corresponding renaming rules. Well it seems implementing the Content-Type header is planned since a long time and there are two items about it in the TODO document of the wget distrib. Maintaining a list of MIME types is not an issue as there are already lists around : * File suffixes and MIME types at Duke University : http://www.duke.edu/websrv/file-extensions.html * MIME Types category at Google : http://www.google.com/Top/Computers/Data_Formats/MIME_Types * ... Just a word about how HTTrack handles MIME types and extensions. It has a powerful --assume option that allows users to assign a MIME type to extensions. For example : All .php files are PNG images. Everything is explained on the Option panel : MIME Types page at http://www.httrack.com/html/step9_opt11.html. I think wget could use such an option. JM.
Re: bug in wget windows
Tobias Koeck wrote: done. == PORT ... done.== RETR SUSE-10.0-EvalDVD-i386-GM.iso ... done. [ = ] -673,009,664 113,23K/s Assertion failed: bytes = 0, file retr.c, line 292 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. you are probably using an older version of wget, without long file support. please upgrade to wget 1.10.2. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Bug rpt
HonzaCh [EMAIL PROTECTED] writes: My localeconv()-thousands_sep (as well as many other struct members) reveals to empty string () (MSVC6.0). How do you know? I mean, what program did you use to check this? My quick'n'dirty one. See the source below. Your source neglects to setlocale(LC_ALL, ), which you must do before locale goes into effect. Otherwise you're getting values from the C locale, which doesn't define thousand separators.
Re: Bug rpt
HonzaCh [EMAIL PROTECTED] writes: Latest version (1.10.1) turns out an UI bug: the thousand separator (space according to my local settings) displays as á (character code 0xA0, see attch.) Although it does not affect the primary function of WGET, it looks quite ugly. Env.: Win2k Pro/Czech (CP852 for console apps, CP1250 for windowed ones). Thanks for the report. Is this a natively compiled Wget or one compiled on Cygwin? Wget obtains the thousand separator from the operating system using the `localeconv' function. According to MSDN (http://tinyurl.com/cumk2 and http://tinyurl.com/chubg), Wget's usage appears to be correct. I'd be surprised if that function didn't function properly on Windows. Can other Windows testers repeat this problem?
Re: Bug handling session cookies
Mark Street [EMAIL PROTECTED] writes: I'm not sure why this [catering for paths without a leading /] is done in the code. rfc1808 declared that the leading / is not really part of path, but merely a separator, presumably to be consistent with its treatment of ;params, ?queries, and #fragments. The author of the code found it appealing to disregard common sense and implement rfc1808 semantics. In most cases the user shouldn't notice the difference, but it has lead to all kinds of implementation problems with code that assumes that URL paths naturally begin with /. Because of that it will be changed later. Note that the forward slash is stripped from prefix, hence never matches full_path. I'm not sure why this is done in the code. Because PREFIX is the path declared by the cookie, which always begins with /, and FULL_PATH is the URL path coming from the URL parsing code, which doesn't begin with a /. To match them, one must indeed strip the leading / off PREFIX. But paths without a slash still caused subtle problems. For example, cookies without a path attribute still had to be stored with the correct cookie-path (with a leading slash). To account for this, the invocation of cookie_handle_set_cookie was modified to prepend the / before the path. This lead to path_match unexpectedly receiving two /-prefixed paths and being unable to match them. The attached patch fixes the problem by: * Making sure that path consistently gets prepended in all entry points to cookie code; * Removing the special logic from path_match. With that change your test case seems to work, and so do all the other tests I could think of. Please let me know if it works for you, and thanks for the detailed bug report. 2005-06-24 Hrvoje Niksic [EMAIL PROTECTED] * http.c (gethttp): Don't prepend / here. * cookies.c (cookie_handle_set_cookie): Prepend / to PATH. (cookie_header): Ditto. Index: src/http.c === --- src/http.c (revision 1794) +++ src/http.c (working copy) @@ -1706,7 +1706,6 @@ /* Handle (possibly multiple instances of) the Set-Cookie header. */ if (opt.cookies) { - char *pth = NULL; int scpos; const char *scbeg, *scend; /* The jar should have been created by now. */ @@ -1717,15 +1716,8 @@ ++scpos) { char *set_cookie; BOUNDED_TO_ALLOCA (scbeg, scend, set_cookie); - if (pth == NULL) - { - /* u-path doesn't begin with /, which cookies.c expects. */ - pth = (char *) alloca (1 + strlen (u-path) + 1); - pth[0] = '/'; - strcpy (pth + 1, u-path); - } - cookie_handle_set_cookie (wget_cookie_jar, u-host, u-port, pth, - set_cookie); + cookie_handle_set_cookie (wget_cookie_jar, u-host, u-port, + u-path, set_cookie); } } Index: src/cookies.c === --- src/cookies.c (revision 1794) +++ src/cookies.c (working copy) @@ -822,6 +822,17 @@ { return path_matches (path, cookie_path) != 0; } + +/* Prepend '/' to string S. S is copied to fresh stack-allocated + space and its value is modified to point to the new location. */ + +#define PREPEND_SLASH(s) do { \ + char *PS_newstr = (char *) alloca (1 + strlen (s) + 1); \ + *PS_newstr = '/';\ + strcpy (PS_newstr + 1, s); \ + s = PS_newstr; \ +} while (0) + /* Process the HTTP `Set-Cookie' header. This results in storing the cookie or discarding a matching one, or ignoring it completely, all @@ -835,6 +846,11 @@ struct cookie *cookie; cookies_now = time (NULL); + /* Wget's paths don't begin with '/' (blame rfc1808), but cookie + usage assumes /-prefixed paths. Until the rest of Wget is fixed, + simply prepend slash to PATH. */ + PREPEND_SLASH (path); + cookie = parse_set_cookies (set_cookie, update_cookie_field, false); if (!cookie) goto out; @@ -977,17 +993,8 @@ static int path_matches (const char *full_path, const char *prefix) { - int len; + int len = strlen (prefix); - if (*prefix != '/') -/* Wget's HTTP paths do not begin with '/' (the URL code treats it - as a mere separator, inspired by rfc1808), but the '/' is - assumed when matching against the cookie stuff. */ -return 0; - - ++prefix; - len = strlen (prefix); - if (0 != strncmp (full_path, prefix, len)) /* FULL_PATH doesn't begin with PREFIX. */ return 0; @@ -1149,6 +1156,7 @@ int count, i, ocnt; char *result; int result_size, pos; + PREPEND_SLASH (path);/* see cookie_handle_set_cookie */ /* First, find the cookie chains whose domains
Re: Bug handling session cookies
Hrvoje, Many thanks for the explanation and the patch. Yes, this patch successfully resolves the problem for my particular test case. Best regards, Mark Street.
Re: Bug handling session cookies
Mark Street [EMAIL PROTECTED] writes: Many thanks for the explanation and the patch. Yes, this patch successfully resolves the problem for my particular test case. Thanks for testing it. It has been applied to the code and will be in Wget 1.10.1 and later.
Re: Bug: wget cannot handle quote
Will Kuhn [EMAIL PROTECTED] writes: Apparentl wget does not handle single quote or double quote very well. wget with the following arguments give error. wget --user-agent='Mozilla/5.0' --cookies=off --header 'Cookie: testbounce=testing; ih=b'!!!0T#8G(5A!!#c`#8HWsH!!#wt#8I0HY!!#yf#8I0G3; cf=b$y~!!!D)#; hi=b#!!!D)8I=C]' 'ad.yieldmanager.com/imp?z=12n=2E=01-329I=508S=508-1' -O /home/admin/http/wwwscanfile.YYO3Cy You haven't stated which error you get, but on my system the error comes from the shell and not from Wget. The problem is that you used single quotes to quote a string that contains, among other things, single quotes. This effectively turned off the quoting for some portions of the text, causing the shell to interpret the bangs (!) as (invalid) history events. To correct the problem, replace ' within single quotes with something like '\'': wget --user-agent='Mozilla/5.0' --cookies=off --header 'Cookie: testbounce=testing; ih=b'\''!!!0T#8G(5A!!#c`#8HWsH!!#wt#8I0HY!!#yf#8I0G3; cf=b$y~!!!D)#; hi=b#!!!D)8I=C]' 'ad.yieldmanager.com/imp?z=12n=2E=01-329I=508S=508-1' -O /home/admin/http/wwwscanfile.YYO3Cy
RE: bug with password containing @
Hi wget ftp://someuser:[EMAIL PROTECTED]@www.somedomain.com/some_file.tgz is splitting using on the first @ not the second. Is this a problem with the URL standard or a wget issue? Regards Andrew Gargan
Re: bug with password containing @
Andrew Gargan [EMAIL PROTECTED] writes: wget ftp://someuser:[EMAIL PROTECTED]@www.somedomain.com/some_file.tgz is splitting using on the first @ not the second. Encode the '@' as %40 and this will work. For example: wget ftp://someuser:[EMAIL PROTECTED]/some_file.tgz Is this a problem with the URL standard or a wget issue? Neither, but maybe URL could be smarter about handling the above case.
Re: bug in static build of wget with socks
Seemant Kulleen [EMAIL PROTECTED] writes: I wanted to alert you all to a bug in wget, reported by one of our (gentoo) users at: https://bugs.gentoo.org/show_bug.cgi?id=69827 I am the maintainer for the Gentoo ebuild for wget. If someone would be willing to look at and help us with that bug, it'd be much appreciated. Since I don't use Gentoo, I'll need more details to fix this. For one, I haven't tried Wget with socks for a while now. Older versions of Wget supported of --with-socks option, but the procedure for linking a program with socks changed since then, and the option was removed due to bitrot. I don't know how the *dynamic* linking against socks works in Gentoo, either. Secondly, I have very little experience with creating static binaries, since I personally don't need them. I don't even know what flags USE=static causes to be passed to the compiler and the linker. Likewise, I don't have a clue why there is a difference between Wget 1.8 and Wget 1.9 in this, nor why the presence of socks makes the slightest difference. I don't even know if this is a bug in Wget or in the way that the build is attempted by the Gentoo package mechanism. Providing the actual build output might shed some light on this.
Re: bug in static build of wget with socks
Seemant Kulleen [EMAIL PROTECTED] writes: Since I don't use Gentoo, I'll need more details to fix this. For one, I haven't tried Wget with socks for a while now. Older versions of Wget supported of --with-socks option, but the procedure for linking a program with socks changed since then, and the option was removed due to bitrot. I don't know how the *dynamic* linking against socks works in Gentoo, either. Ah ok, ./configure --help still shows the option, so this is fairly undocumented then. I spoke too soon: it turns out that --with-socks is only removed in Wget 1.10 (now in beta). But --with-socks in 1.9.1 doesn't really force linking with the socks library, it merely checks for a Rconnect function in -lsocks. If that is not found, the build is continued as usual. You should check the configure output (along with `ldd' on the resulting executable) to see if that really worked. I don't even know if this is a bug in Wget or in the way that the build is attempted by the Gentoo package mechanism. Providing the actual build output might shed some light on this. if use static; then emake LDFLAGS=--static || die I now tried `LDFLAGS=--static ./configure', and it seems to work in 1.10. Linking does produce two warnings, but the resulting executable is static.
Re: Bug when downloading large files (over 2 gigs) from proftpd server.
This problem has been fixed for the upcoming 1.10 release. If you want to try it, it's available at ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-alpha2.tar.bz2
Re: Bug
Hi Jorge! Current wget versions do not support large files 2GB. However, the CVS version does and the fix will be introduced to the normal wget source. Jens (just another user) When downloading a file of 2GB and more, the counter get crazy, probably it should have a long instead if a int number. -- DSL Komplett von GMX +++ Supergünstig und stressfrei einsteigen! AKTION Kein Einrichtungspreis nutzen: http://www.gmx.net/de/go/dsl
RE: bug-wget still useful
I don't know why you say that. I see bug reports and discussion of fixes flowing through here on a fairly regular basis. Mark Post -Original Message- From: Dan Jacobson [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 15, 2005 3:04 PM To: [EMAIL PROTECTED] Subject: bug-wget still useful Is it still useful to mail to [EMAIL PROTECTED] I don't think anybody's home. Shall the address be closed?
Re: bug-wget still useful
Dan Jacobson [EMAIL PROTECTED] writes: Is it still useful to mail to [EMAIL PROTECTED] I don't think anybody's home. Shall the address be closed? If you're referring to Mauro being busy, I don't see it as a reason to close the bug reporting address.
Re: bug-wget still useful
P I don't know why you say that. I see bug reports and discussion of fixes P flowing through here on a fairly regular basis. All I know is my reports for the last few months didn't get the usual (any!) cheery replies. However, I saw them on Gmane, yes.
Re: Bug: really large files cause problems with status text
Quoting Alan Robinson [EMAIL PROTECTED]: When downloading a 4.2 gig file (such as from ftp://movies06.archive.org/2/movies/abe_lincoln_of_the_4th_ave/abe_lincoln_o f_the_4th_ave.mpeg ) cause the status text (i.e. 100%[+===] 38,641,328 213.92K/sETA 00:00) to print invalid things (in this case, that 100% of the file has been downloaded, even though only 40MB really has. It is a Frequently Asked Question, with the answer that people are working on it. // Ulf
Re: Bug (wget 1.8.2): Wget downloads files rejected with -R.
Hi Jason! If I understood you correctly, this quote from the manual should help you: *** Note that these two options [accept and reject based on filenames] do not affect the downloading of HTML files; Wget must load all the HTMLs to know where to go at all--recursive retrieval would make no sense otherwise. *** If you are seeing wget behaviour different from this, please a) update your wget and b) provide more details where/how it happens. CU good luck! Jens (just another user) When the -R option is specified to reject files by name in recursive mode, wget downloads them anyway then deletes them after downloading. This is a problem when you are trying to be picky about the files you are downloading to save bandwidth. Since wget appears to know the name of the file it is downloading before it is downloaded (even if the specified URL is redirected to a different filename), then it should not bother downloading the file at all if it is going to delete it immediately after downloading it. - Jason Cipriani -- GMX im TV ... Die Gedanken sind frei ... Schon gesehen? Jetzt Spot online ansehen: http://www.gmx.net/de/go/tv-spot
Re: Bug#261755: Control sequences injection patch
On Sun, Aug 22, 2004 at 08:02:54PM +0200, Jan Minar wrote: +/* vasprintf() requires _GNU_SOURCE. Which is OK with Debian. */ +#ifndef _GNU_SOURCE +#define _GNU_SOURCE This must be done before stdio.h is included. +#endif +#include ctype.h + #ifndef errno extern int errno; #endif @@ -345,7 +351,49 @@ int expected_size; int allocated; }; + +/* XXX Where does the declaration belong?? */ +void escape_buffer (char **src); +/* + * escape_untrusted -- escape using '\NNN'. To be used wherever we want to + * print untrusted data. + * + * Syntax: escape_buffer (buf-to-escape); + */ +void escape_buffer (char **src) +{ + char *dest; + int i, j; + + /* We encode each byte using at most 4 bytes, + trailing '\0'. */ + dest = xmalloc (4 * strlen (*src) + 1); + + for (i = j = 0; (*src)[i] != '\0'; ++i) { + /* + * We allow any non-control character, because LINE TABULATION + * friends can't do more harm than SPACE. And someone + * somewhere might be using these, so unless we actually can't + * protect against spoofing attacks, we don't pretend we can. + * + * Note that '\n' is included both in the isspace() *and* + * iscntrl() range. + */ + if (isprint((*src)[i]) || isspace((*src)[i])) { This lets '\r' thru, not good. BTW, (*src)[i] is quite a cypher. + dest[j++] = (*src)[i]; + } else { + dest[j++] = '\\'; + dest[j++] = '0' + (((*src)[i] 0xff) 6); + dest[j++] = '0' + (((*src)[i] 0x3f) 3); + dest[j++] = '0' + ((*src)[i] 7); + } + } + dest[j] = '\0'; + + xfree (*src); + *src = dest; +} Attached is version 2, which solves these problems. Please keep me CC'd. Jan. -- To me, clowns aren't funny. In fact, they're kind of scary. I've wondered where this started and I think it goes back to the time I went to the circus, and a clown killed my dad. --- wget-1.9.1.ORIG/src/log.c 2004-08-22 13:42:33.0 +0200 +++ wget-1.9.1-jan/src/log.c2004-08-24 02:38:38.0 +0200 @@ -42,6 +42,12 @@ # endif #endif /* not WGET_USE_STDARG */ +/* vasprintf() requires _GNU_SOURCE. Which is OK with Debian. */ +/* This *must* be defined before stdio.h is included. */ +#ifndef _GNU_SOURCE +# define _GNU_SOURCE +#endif + #include stdio.h #ifdef HAVE_STRING_H # include string.h @@ -63,6 +69,8 @@ #include wget.h #include utils.h +#include ctype.h + #ifndef errno extern int errno; #endif @@ -345,7 +353,69 @@ int expected_size; int allocated; }; + +/* XXX Where does the declaration belong?? */ +void escape_buffer (char **src); +/* + * escape_buffer -- escape using '\NNN'. To be used wherever we want to print + * untrusted data. + * + * Syntax: escape_buffer (buf-to-escape); + */ +void escape_buffer (char **src) +{ + char *dest, c; + int i, j; + + /* We encode each byte using at most 4 bytes, + trailing '\0'. */ + dest = xmalloc (4 * strlen (*src) + 1); + + for (i = j = 0; (c = (*src)[i]) != '\0'; ++i) { + /* +* We allow any non-control character, because '\t' friends +* can't do more harm than SPACE. And someone somewhere might +* be using these, so unless we actually can protect against +* spoofing attacks, we don't pretend it. +* +* Note that '\n' is included both in the isspace() *and* +* iscntrl() range. +* +* We try not to allow '\r' friends by using isblank() +* instead of isspace(). Let's hope noone will complain about +* '\v' similar being filtered (the characters we may still +* let thru can vary among locales, so there is not much we can +* do about this *from within logvprintf()*. +*/ + if (c == '\r' *(c + 1) == '\n') { + /* +* I've spotted wget printing CRLF line terminators +* while communicating with ftp://ftp.debian.org. This +* is a bug: wget should print whatever the platform +* line terminator is (CR on Mac, CRLF on CP/M, LF on +* Un*x, etc.) +* +* We work around this bug here by taking CRLF for a +* line terminator. A lone CR is still treated as a +* control character. +*/ + i++; + dest[j++] = '\n'; + } else if (isprint(c) || isblank(c) || c == '\n') { + dest[j++] = c; +
Re: Bug#261755: Control sequences injection patch
tags 261755 +patch thanks On Sun, Aug 22, 2004 at 11:39:07AM +0200, Thomas Hood wrote: The changes contemplated look very invasive. How quickly can this bug be fixed? Here we go: Hacky, non-portable, but pretty slick non-invasive, whatever that means. Now I'm going to check whether it is going to catch all the cases where malicious characters could be possibly injected. This patch (hopefully) solves the problem of remote attacker (server or otherwise) injects malicious control sequences in the HTTP headers. It by no mean solves the spoofing bug, which is by nature tricky to address well. Cheers, Jan. -- To me, clowns aren't funny. In fact, they're kind of scary. I've wondered where this started and I think it goes back to the time I went to the circus, and a clown killed my dad. --- wget-1.9.1.WORK/debian/changelog2004-08-22 19:34:16.0 +0200 +++ wget-1.9.1-jan/debian/changelog 2004-08-22 19:39:48.0 +0200 @@ -1,3 +1,12 @@ +wget (1.9.1-4.local-1) unstable; urgency=medium + + * Local build + * Hopeless attempt to filter control chars in log output (see +Bug#267393) + * This probably SHOULD make it in Sarge revision 0 + + -- Jan Min? [EMAIL PROTECTED] Sun, 22 Aug 2004 19:39:02 +0200 + wget (1.9.1-4) unstable; urgency=low * made passive the default. sorry forgot again.:( --- wget-1.9.1.WORK/src/log.c 2004-08-22 19:34:16.0 +0200 +++ wget-1.9.1-jan/src/log.c2004-08-22 19:31:33.0 +0200 @@ -63,6 +63,12 @@ #include wget.h #include utils.h +/* vasprintf() requires _GNU_SOURCE. Which is OK with Debian. */ +#ifndef _GNU_SOURCE +#define _GNU_SOURCE +#endif +#include ctype.h + #ifndef errno extern int errno; #endif @@ -345,7 +351,49 @@ int expected_size; int allocated; }; + +/* XXX Where does the declaration belong?? */ +void escape_buffer (char **src); +/* + * escape_untrusted -- escape using '\NNN'. To be used wherever we want to + * print untrusted data. + * + * Syntax: escape_buffer (buf-to-escape); + */ +void escape_buffer (char **src) +{ + char *dest; + int i, j; + + /* We encode each byte using at most 4 bytes, + trailing '\0'. */ + dest = xmalloc (4 * strlen (*src) + 1); + + for (i = j = 0; (*src)[i] != '\0'; ++i) { + /* +* We allow any non-control character, because LINE TABULATION +* friends can't do more harm than SPACE. And someone +* somewhere might be using these, so unless we actually can't +* protect against spoofing attacks, we don't pretend we can. +* +* Note that '\n' is included both in the isspace() *and* +* iscntrl() range. +*/ + if (isprint((*src)[i]) || isspace((*src)[i])) { + dest[j++] = (*src)[i]; + } else { + dest[j++] = '\\'; + dest[j++] = '0' + (((*src)[i] 0xff) 6); + dest[j++] = '0' + (((*src)[i] 0x3f) 3); + dest[j++] = '0' + ((*src)[i] 7); + } + } + dest[j] = '\0'; + + xfree (*src); + *src = dest; +} + /* Print a message to the log. A copy of message will be saved to saved_log, for later reusal by log_dump_context(). @@ -364,15 +412,28 @@ int available_size = sizeof (smallmsg); int numwritten; FILE *fp = get_log_fp (); + char *buf; + + /* int vasprintf(char **strp, const char *fmt, va_list ap); */ + if (vasprintf (buf , fmt, args) == -1) { +perror (_(Error)); +exit (1); + } + + escape_buffer (buf); if (!save_context_p) { /* In the simple case just call vfprintf(), to avoid needless allocation and games with vsnprintf(). */ - vfprintf (fp, fmt, args); - goto flush; -} + /* vfprintf() didn't check return value, neither will we */ + (void) fprintf(fp, %s, buf); +} + else /* goto flush; */ /* There's no need to use goto here */ +/* This else-clause purposefully shifted 4 columns to the left, so that the + * diff is easy to read --Jan */ +{ if (state-allocated != 0) { write_ptr = state-bigmsg; @@ -384,8 +445,12 @@ missing from legacy systems. Therefore I consider it safe to assume that its return value is meaningful. On the systems where vsnprintf() is not available, we use the implementation from - snprintf.c which does return the correct value. */ - numwritten = vsnprintf (write_ptr, available_size, fmt, args); + snprintf.c which does return the correct value. + + With snprintf(), this probably doesn't hold anymore. But this is Debian, + so who cares. */ + + numwritten = snprintf (write_ptr, available_size, %s, buf); /* vsnprintf() will not step over the limit given by available_size. If it fails, it will return either -1 (POSIX?) or the number of @@ -420,7 +485,7 @@
Re: Bug in wget 1.9.1 documentation
Tristan Miller [EMAIL PROTECTED] writes: There appears to be a bug in the documentation (man page, etc.) for wget 1.9.1. I think this is a bug in the man page generation process.
Re: [BUG] wget 1.9.1 and below can't download =2G file on 32bits system
Yup; 1.9.1 cannot download large files. I hope to fix this by the next release.
Re: Bug report
Juhana Sadeharju [EMAIL PROTECTED] writes: Command: wgetdir http://liarliar.sourceforge.net;. Problem: Files are named as content.php?content.2 content.php?content.3 content.php?content.4 which are interpreted, e.g., by Nautilus as manual pages and are displayed as plain texts. Could the files and the links to them renamed as the following? content.php?content.2.html content.php?content.3.html content.php?content.4.html Use the option `--html-extension' (-E). After all, are those pages still php files or generated html files? If they are html files produced by the php files, then it could be a good idea to add a new extension to the files. They're the latter -- HTML files produced by the server-side PHP code. Command: wgetdir http://www.newtek.com/products/lightwave/developer/lscript2.6/index.html; Problem: Images are not downloaded. Perhaps because the image links are the following: image src=v26_2.jpg I've never seen this tag, but it seems to be the same as IMG. Mozilla seems to grok it and its DOM inspector thinks it has seen IMG. Is this tag documented anywhere? Does IE understand it too?
Re: Bug in wget: cannot request urls with double-slash in the query string
D Richard Felker III [EMAIL PROTECTED] writes: The request log shows that the slashes are apparently respected. I retried a test case and found the same thing -- the slashes were respected. OK. Then I remembered that I was using -i. Wget seems to work fine with the url on the command line; the bug only happens when the url is passed in with: cat EOF | wget -i - http://... EOF But I cannot repeat that, either. As long as the consecutive slashes are in the query string, they're not stripped. Using this method is necessary since it is the ONLY secure way I know of to do a password-protected http request from a shell script. Yes, that is the best way to do it.
Re: bug in use index.html
The whole matter of conversion of / to /index.html on the file system is a hack. But I really don't know how to better represent empty trailing file name on the file system.
Re: bug in use index.html
Hrvoje Niksic wrote: The whole matter of conversion of / to /index.html on the file system is a hack. But I really don't know how to better represent empty trailing file name on the file system. Another, for now rather limited, hack: on file systems which support some sort of file attributes you can mark index.html as an unwanted child of an empty trailing file name. AFAIK, that should work at least on Solaris and Linux. Others will join the club one day, I hope. -- .-. .-.Yes, I am an agent of Satan, but my duties are largely (_ \ / _) ceremonial. | |[EMAIL PROTECTED]
Re: Bug in wget: cannot request urls with double-slash in the query string
On Mon, Mar 01, 2004 at 07:25:52PM +0100, Hrvoje Niksic wrote: Removing the offending code fixes the problem, but I'm not sure if this is the correct solution. I expect it would be more correct to remove multiple slashes only before the first occurrance of ?, but not afterwards. That's exactly what should happen. Please give us more details, if possible accompanied by `-d' output. If you'd still like details now that you know the version I was using, let me know and I'll be happy to do some tests. Yes please. For example, this is how it works for me: $ /usr/bin/wget -d http://www.xemacs.org/something?redirect=http://www.cnn.com; DEBUG output created by Wget 1.8.2 on linux-gnu. --19:23:02-- http://www.xemacs.org/something?redirect=http://www.cnn.com = `something?redirect=http:%2F%2Fwww.cnn.com' Resolving www.xemacs.org... done. Caching www.xemacs.org = 199.184.165.136 Connecting to www.xemacs.org[199.184.165.136]:80... connected. Created socket 3. Releasing 0x8080b40 (new refcount 1). ---request begin--- GET /something?redirect=http://www.cnn.com HTTP/1.0 User-Agent: Wget/1.8.2 Host: www.xemacs.org Accept: */* Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ... The request log shows that the slashes are apparently respected. I retried a test case and found the same thing -- the slashes were respected. Then I remembered that I was using -i. Wget seems to work fine with the url on the command line; the bug only happens when the url is passed in with: cat EOF | wget -i - http://... EOF Using this method is necessary since it is the ONLY secure way I know of to do a password-protected http request from a shell script. Otherwise the password appears on the command line... Rich
Re: Bug in wget: cannot request urls with double-slash in the query string
D Richard Felker III [EMAIL PROTECTED] writes: The following code in url.c makes it impossible to request urls that contain multiple slashes in a row in their query string: [...] That code is removed in CVS, so multiple slashes now work correctly. Think of something like http://foo/bar/redirect.cgi?http://... wget translates this into: [...] Which version of Wget are you using? I think even Wget 1.8.2 didn't collapse multiple slashes in query strings, only in paths. Removing the offending code fixes the problem, but I'm not sure if this is the correct solution. I expect it would be more correct to remove multiple slashes only before the first occurrance of ?, but not afterwards. That's exactly what should happen. Please give us more details, if possible accompanied by `-d' output.
Re: Bug in wget: cannot request urls with double-slash in the query string
On Mon, Mar 01, 2004 at 03:36:55PM +0100, Hrvoje Niksic wrote: D Richard Felker III [EMAIL PROTECTED] writes: The following code in url.c makes it impossible to request urls that contain multiple slashes in a row in their query string: [...] That code is removed in CVS, so multiple slashes now work correctly. Think of something like http://foo/bar/redirect.cgi?http://... wget translates this into: [...] Which version of Wget are you using? I think even Wget 1.8.2 didn't collapse multiple slashes in query strings, only in paths. I was using 1.8.2 and noticed the problem, so I upgraded to 1.9.1 and it persisted. Removing the offending code fixes the problem, but I'm not sure if this is the correct solution. I expect it would be more correct to remove multiple slashes only before the first occurrance of ?, but not afterwards. That's exactly what should happen. Please give us more details, if possible accompanied by `-d' output. If you'd still like details now that you know the version I was using, let me know and I'll be happy to do some tests. Rich
Re: Bug in wget: cannot request urls with double-slash in the query string
D Richard Felker III [EMAIL PROTECTED] writes: Think of something like http://foo/bar/redirect.cgi?http://... wget translates this into: [...] Which version of Wget are you using? I think even Wget 1.8.2 didn't collapse multiple slashes in query strings, only in paths. I was using 1.8.2 and noticed the problem, so I upgraded to 1.9.1 and it persisted. OK. Removing the offending code fixes the problem, but I'm not sure if this is the correct solution. I expect it would be more correct to remove multiple slashes only before the first occurrance of ?, but not afterwards. That's exactly what should happen. Please give us more details, if possible accompanied by `-d' output. If you'd still like details now that you know the version I was using, let me know and I'll be happy to do some tests. Yes please. For example, this is how it works for me: $ /usr/bin/wget -d http://www.xemacs.org/something?redirect=http://www.cnn.com; DEBUG output created by Wget 1.8.2 on linux-gnu. --19:23:02-- http://www.xemacs.org/something?redirect=http://www.cnn.com = `something?redirect=http:%2F%2Fwww.cnn.com' Resolving www.xemacs.org... done. Caching www.xemacs.org = 199.184.165.136 Connecting to www.xemacs.org[199.184.165.136]:80... connected. Created socket 3. Releasing 0x8080b40 (new refcount 1). ---request begin--- GET /something?redirect=http://www.cnn.com HTTP/1.0 User-Agent: Wget/1.8.2 Host: www.xemacs.org Accept: */* Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ... The request log shows that the slashes are apparently respected.
Re: bug in connect.c
Interesting. Is it really necessary to zero out sockaddr/sockaddr_in before using it? I see that some sources do it, and some don't. I was always under the impression that, as long as you fill the relevant members (sin_family, sin_addr, sin_port), other initialization is not necessary. Was I mistaken, or is this something specific to FreeBSD? Do others have experience with this? e.g. look at http://cvs.tartarus.org/putty/unix/uxnet.c putty encountered the very same problem ... regards manfred
Re: bug in connect.c
Manfred Schwarb [EMAIL PROTECTED] writes: Interesting. Is it really necessary to zero out sockaddr/sockaddr_in before using it? I see that some sources do it, and some don't. I was always under the impression that, as long as you fill the relevant members (sin_family, sin_addr, sin_port), other initialization is not necessary. Was I mistaken, or is this something specific to FreeBSD? Do others have experience with this? e.g. look at http://cvs.tartarus.org/putty/unix/uxnet.c putty encountered the very same problem ... Amazing. This obviously doesn't show up when binding to remote addresses, or it would have been noticed ages ago. Thanks for the pointer. This patch should fix the problem in the CVS version: 2004-02-06 Hrvoje Niksic [EMAIL PROTECTED] * connect.c (sockaddr_set_data): Zero out sockaddr_in/sockaddr_in6. Apparently BSD-derived stacks need this when binding a socket to local address. Index: src/connect.c === RCS file: /pack/anoncvs/wget/src/connect.c,v retrieving revision 1.62 diff -u -r1.62 connect.c --- src/connect.c 2003/12/12 14:14:53 1.62 +++ src/connect.c 2004/02/06 16:59:01 @@ -87,6 +87,7 @@ case IPV4_ADDRESS: { struct sockaddr_in *sin = (struct sockaddr_in *)sa; + xzero (*sin); sin-sin_family = AF_INET; sin-sin_port = htons (port); sin-sin_addr = ADDRESS_IPV4_IN_ADDR (ip); @@ -96,6 +97,7 @@ case IPV6_ADDRESS: { struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)sa; + xzero (*sin6); sin6-sin6_family = AF_INET6; sin6-sin6_port = htons (port); sin6-sin6_addr = ADDRESS_IPV6_IN6_ADDR (ip);
Re: bug in connect.c
francois eric [EMAIL PROTECTED] writes: after some test: bug is when: ftp, with username and password, with bind address specifyed bug is not when: http, ftp without username and password looks like memory leaks. so i made some modification before bind: src/connect.c: -- ... /* Bind the client side to the requested address. */ wget_sockaddr bsa; //! memset (bsa,0,sizeof(bsa)); /!! wget_sockaddr_set_address (bsa, ip_default_family, 0, bind_address); if (bind (sock, bsa.sa, sockaddr_len ())) .. -- after it all downloads become sucesfull. i think better do memset in wget_sockaddr_set_address, but it is for your choose. Interesting. Is it really necessary to zero out sockaddr/sockaddr_in before using it? I see that some sources do it, and some don't. I was always under the impression that, as long as you fill the relevant members (sin_family, sin_addr, sin_port), other initialization is not necessary. Was I mistaken, or is this something specific to FreeBSD? Do others have experience with this?
Re: Bug: Support of charcters like '\', '?', '*', ':' in URLs
Frank Klemm [EMAIL PROTECTED] writes: Wget don't work properly when the URL contains characters which are not allowed in file names on the file system which is currently used. These are often '\', '?', '*' and ':'. Affected are at least: - Windows and related OS - Linux when using FAT or Samba as file system [...] Thanks for the report. This has been fixed in Wget 1.9-beta. It doesn't use characters that FAT can't handle by default, and if you use a mounted FAT filesystem, you can tell Wget to assume behavior as if it were under Windows.
Re: bug in 1.8.2 with
You're right -- that code was broken. Thanks for the patch; I've now applied it to CVS with the following ChangeLog entry: 2003-10-15 Philip Stadermann [EMAIL PROTECTED] * ftp.c (ftp_retrieve_glob): Correctly loop through the list whose elements might have been deleted.
RE: Bug in Windows binary?
From: Gisle Vanem [mailto:[EMAIL PROTECTED] Jens Rösner [EMAIL PROTECTED] said: ... I assume Heiko didn't notice it because he doesn't have that function in his kernel32.dll. Heiko and Hrvoje, will you correct this ASAP? --gv Probably. Currently I'm compiling and testing on NT 4.0 only. Beside that I'm VERY tight on time in this moment so testing usually means does it run ? Does it download one sample http and one https site ? Yes ? Put it up for testing!. Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] -- +39-041-5907073 ph -- +39-041-5907472 fax
Re: Bug in Windows binary?
Jens Rsner [EMAIL PROTECTED] said: I downloaded wget 1.9 beta 2003/09/29 from Heiko http://xoomer.virgilio.it/hherold/ ... wget -d http://www.google.com DEBUG output created by Wget 1.9-beta on Windows. set_sleep_mode(): mode 0x8001, rc 0x8000 I disabled my wgetrc as well and the output was exactly the same. I then tested wget 1.9 beta 2003/09/18 (earlier build!) from the same place and it works smoothly. Can anyone reproduce this bug? Yes, but the MSVC version crashed on my machine. But I've found the cause caused by my recent change :( A simple case of wrong calling-convention: --- mswindows.c.org Mon Sep 29 11:46:06 2003 +++ mswindows.c Sun Oct 05 17:34:48 2003 @@ -306,7 +306,7 @@ DWORD set_sleep_mode (DWORD mode) { HMODULE mod = LoadLibrary (kernel32.dll); - DWORD (*_SetThreadExecutionState) (DWORD) = NULL; + DWORD (WINAPI *_SetThreadExecutionState) (DWORD) = NULL; DWORD rc = (DWORD)-1; I assume Heiko didn't notice it because he doesn't have that function in his kernel32.dll. Heiko and Hrvoje, will you correct this ASAP? --gv
Re: Bug in Windows binary?
Gisle Vanem [EMAIL PROTECTED] writes: --- mswindows.c.org Mon Sep 29 11:46:06 2003 +++ mswindows.c Sun Oct 05 17:34:48 2003 @@ -306,7 +306,7 @@ DWORD set_sleep_mode (DWORD mode) { HMODULE mod = LoadLibrary (kernel32.dll); - DWORD (*_SetThreadExecutionState) (DWORD) = NULL; + DWORD (WINAPI *_SetThreadExecutionState) (DWORD) = NULL; DWORD rc = (DWORD)-1; I assume Heiko didn't notice it because he doesn't have that function in his kernel32.dll. Heiko and Hrvoje, will you correct this ASAP? I've now applied the patch, thanks. I use the following ChangeLog entry: 2003-10-05 Gisle Vanem [EMAIL PROTECTED] * mswindows.c (set_sleep_mode): Fix type of _SetThreadExecutionState.
Re: BUG in --timeout (exit status)
This problem is not specific to timeouts, but to recursive download (-r). When downloading recursively, Wget expects some of the specified downloads to fail and does not propagate that failure to the code that sets the exit status. This unfortunately includes the first download, which should probably be an exception.
Re: BUG in --timeout (exit status)
OK, I see. But I do not agree. And I don't think it is a good idea to treat the first download special. In my opinion, exit status 0 means everything during the whole retrieval went OK. My prefered solution would be to set the final exit status to the highest exit status of all individual downloads. Of course, retries which are triggered by --tries should erase the exit status of the previous attempt. A non-zero exit status does not mean nothing went OK but some individual downloads failed somehow. And setting a non-zero exit status does not mean wget has to stop retrieval immediately, it is OK to continue. Again, wget's behaviour is not what the user expects. And the user has always the possibility to make combinations of --accept, --reject, --domains, etc. so in normal cases all individual downloads succeed, if he needs a exit status 0. If he does not care about exit status, there is no problem at all, of course... regards Manfred Zitat von Hrvoje Niksic [EMAIL PROTECTED]: This problem is not specific to timeouts, but to recursive download (-r). When downloading recursively, Wget expects some of the specified downloads to fail and does not propagate that failure to the code that sets the exit status. This unfortunately includes the first download, which should probably be an exception. This message was sent using IMP, the Internet Messaging Program.
Re: bug maybe?
Randy Paries [EMAIL PROTECTED] writes: Not sure if this is a bug or not. I guess it could be called a bug, although it's no simple oversight. Wget currently doesn't support large files.
RE: bug maybe?
how do I get off this list? I tried a few times before got no response from the server. thank you- Matt -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 23, 2003 8:53 PM To: Randy Paries Cc: [EMAIL PROTECTED] Subject: Re: bug maybe? Randy Paries [EMAIL PROTECTED] writes: Not sure if this is a bug or not. I guess it could be called a bug, although it's no simple oversight. Wget currently doesn't support large files.