Re: --disable-dns-cache
Mauro Tortonesi wrote: On Tue, 2 Sep 2003, Hrvoje Niksic wrote: Mauro Tortonesi [EMAIL PROTECTED] writes: On Tue, 2 Sep 2003, Jeremy Reeve wrote: I've written a trivial patch to implement the --disable-dns-cache feature as described in the TODO contained in the CVS tree. I need to write the Changelog entry which I'll do and post to the patches list ASAP. you should probably not bother writing it. That's not quite true. from the PATCHES file included in the wget distribution: ** ChangeLog policy. Each patch should be accompanied by an update to the appropriate ChangeLog file. *** Please don't mail patches to ChangeLog because they have an extremely high rate of failure; just mail us the new part of the ChangeLog you added. *** [I added the highlight] Perhaps the wording should have been more clear, but this means: please do write the ChangeLog entry and send it, but don't send an actual DIFF of the old and new ChangeLog, because such diffs don't apply cleanly more often than not. That is reiterated here: Patches without a ChangeLog entry will be accepted, but this creates additional work for the maintainers, so *** please do write the ChangeLog entries. *** you're right, hrvoje. when i answered jeremy's mail i was in a hurry. what i wanted to say is: you should probably not bother writing a correctly formatted ChangeLog, but just send a simple report of the changes you've made. That contradicts Hrvoje, and really doesn't make sense. Jeremy *should* write a properly formatted ChangeLog. As author of his changes, he is the best person to summarize exactly what they do. Why not do so in the standard form of a ChangeLog entry? To do otherwise is to push more work onto Hrvoje. Max.
Re: how to save the file in memory
niurui wrote: Hi, all I want to embed wget in my program. first my program will use wget to get a file from internet, then it will search this file for some words. So i want to save that file in memory rather than on hard disk. Can wget save the file in memory? how to? Wget is totally the wrong tool for this job. You would probably do better writing the program in Perl, and using its LWP web library. Max.
Re: Feature request: file exclude.
Sherwood Botsford wrote: I wish I had a file exclude option. I'm behind a firewall that doesn't allow ftp, so I have to find sites that use http for file transfer. I'm currently trying to install cygwin on my local lan network. To do that, I;m using wget to mirror the remote site locally, as I hve a very slow transfer rate. I don't really want the sources, just the binaries. But I've got no way to say --exclude-file *src.bz2 -R *src.bz2 ? Another option that may make sense would be to ask wget to fetch the html files in the tree, and construct a list of the non-html files. In that way, it would be easy to edit the list, then resubmit it. as a list of things to get. Cool program. Saves much grief. Oh, yeah. mention the xy=on logging function in the man page. Very cool. Max
Re: Not 100% rfc 1738 complience for FTP URLs = bug
David Balazic wrote: As I got no response on [EMAIL PROTECTED], I am resending my report here. One forwards to the other. The problem is that the wget maintainer is absent, and likely to continue to be so for several more months. As a result, wget development is effectively stalled. Max. -- Hi! I noticed that wget ( 1.8.2 ) does not conform 100% to RFC 1738 when handling FTP URLs : wget ftp://user1:[EMAIL PROTECTED]/x/y/foo does this : USER user1 PASS secret1 SYST PWD ( let's say this returns /home/user1 ) TYPE I CWD /home/user1/x/y PORT 11,22,33,44,3,239 RETR foo Why does it prepend the current working directory to the path ? wget does CWD /home/user1/x/y , while RFC 1738 suggests : CWD x CWD y ? This _usually_ results in the same results, except : - ftp://user1:[EMAIL PROTECTED]//x/y/foo wget : CWD /x/y rfc : CWD # empty parameter ! this usually puts one in the $HOME directory CWD x CWD y So wget will try to fetch the file /x/y/foo , while an RFC compliant program would fetch $HOME/x/y/foo - non unix and other weird systems. Example : wget ftp://user1:[EMAIL PROTECTED]/DAD4%3A%5Bperl5%5D/FREEWARE_README.TXT does not work. Also the following variations don't work either : wget ftp://user1:[EMAIL PROTECTED]/DAD4:[perl5]FREEWARE_README.TXT wget ftp://user1:[EMAIL PROTECTED]/DAD4%3A%5Bperl5%5DFREEWARE_README.TXT wget ftp://user1:[EMAIL PROTECTED]/DAD4:/perl5/FREEWARE_README.TXT Using a regular ftp client , the follwoing works : open connection log in : - first possibility : get DAD4:[perl5]FREEWARE_README.TXT - second : cd DAD4:[perl5] get FREEWARE_README.TXT Another example with more directory levels : get DAD4:[MTOOLS.AXP_EXE]MTOOLS.EXE or cd DAD4:[MTOOLS.AXP_EXE] get MTOOLS.EXE or cd DAD4:[MTOOLS] cd AXP_EXE get MTOOLS.EXE I recommend removing the coolsmart code and stick to RFCs :-)
Re: Not 100% rfc 1738 complience for FTP URLs = bug
David Balazic wrote: Max Bowsher wrote: David Balazic wrote: As I got no response on [EMAIL PROTECTED], I am resending my report here. One forwards to the other. The problem is that the wget maintainer is absent, and likely to continue to be so for several more months. As a result, wget development is effectively stalled. So it is do it yourself , huh ? :-) More to the point, *no one* is available who has cvs write access. Max.
Re: wget with Router
Kalin KOZHUHAROV wrote: Dieter Kuntz wrote: i will test with --http-user.. OK, I think you will not make it this way. What we are talking here is form sumbmission, not just password. Your password happenes to be part of a form. So first look at the html source of the page where you login and look for the form method=WHAT?? action=where . If it is WHAT??==GET, you are ok. You can write it wget http://1921.68.2.1:88/WHERE?password And your password will be easily seen from everyone :-) If WHAT??==POST, you are out of luck. You need wpost, not wget :-) (no such program for now). Try cURL. That does POSTs. Max.
Re: Removing files and directories not present on remote FTP server
Nick Earle wrote: Can someone let me know if and when this feature is going to be implemented within WGet. I am currently using WGet on a Windows platform and this is the only feature that appears to be missing from this utility. wget development seems to be at a standstill. AFAIK, there is no one on this list authorized to commit changes to wget. A perl script on listings files could do what you want. Something like the attached. Max. listings2Attic.pl Description: Binary data
Re: Removing files and directories not present on remote FTP server
Nick Earle wrote: Can someone let me know if and when this feature is going to be implemented within WGet. If and when someone decides to implement it. But there is almost certainly not going to be another release until after Hrvoje Niksic has returned. Max.
Re: dev. of wget (was Re: Removing files and directories not present on remote FTP server
Aaron S. Hawley wrote: On Fri, 14 Feb 2003, Max Bowsher wrote: If and when someone decides to implement it. But there is almost certainly not going to be another release until after Hrvoje Niksic has returned. Can someone at FSF do something? [EMAIL PROTECTED], [EMAIL PROTECTED] This seems like the silliest reason to temporarily halt development. See: http://www.mail-archive.com/wget@sunsite.dk/msg04555.html Max.
Re: Source code download
Praveen wrote: Hi there, I am trying to download some asp file. It gives me the html output of the file. Is it possible to download the source code through wget. No. Max.
Re: bug or limitation of wget used to access VMS servers
- Original Message - From: Ken Senior [EMAIL PROTECTED] There does not seem to be support to change disks when accessing a VMS server via wget. Is this a bug or just a limitation? Wget does plain old HTTP and FTP. I know nothing about VMS. Does it have some strange syntax for discs? Max.
Re: fetching asp file
deeps [EMAIL PROTECTED] wrote: I just want to see the source files for this java chat application, !?!?!?! Java is a compiled language! The source files probably aren't even on the webserver. Max.
Re: mirroring question
DennisBagley [EMAIL PROTECTED] wrote: ok - am using wget to mirror an ftp site [duh] and would like it not only to keep an up to date copy of the files [which it does beautifuly] but also remove files that are no-longer on the ftp server ?? Is this possible ??? Use a perl script. Max.
Re: meta crash bug
It's already fixed in CVS for 1.9 Max. Ivan A. Bolsunov [EMAIL PROTECTED] wrote: version: 1.8.1 in file: html-url.c in function: tag_handle_meta() { ... skipped ... char *p, *refresh = find_attr (tag, content, attrind); int timeout = 0; for (p = refresh; ISDIGIT (*p); p++) ... skipped ... } BUG description: find_attr() MAY return NULL, but this NOT checked in code listed above, JUST USING POINTERS WITHOUT NULL CHECKING, do you understand me??? :) For example: Wget CRASH when trying grab URL from this MALFORMED BUT POSSIBLE tag: meta http-equiv=Refresh
Re: Can't use wildcard
ROMNEY Jean-Francois [EMAIL PROTECTED] wrote: I can't download files with wget 1.8.1 by using wildcards. Without wildcards, it works. The option --glob=on seems to have no effect. The command is : wget -d --glob==on -nc ftp://--:---;ftpcontent.mediapps.com/francais/journal/eco/*.jpg You've got 2 '='s after glob, and your shell is probably eating the special characters. Escape/quote them. Max.
Re: Can't use wildcard
ROMNEY Jean-Francois [EMAIL PROTECTED] wrote: I can't download files with wget 1.8.1 by using wildcards. Without wildcards, it works. The option --glob=on seems to have no effect. The command is : wget -d --glob==on -nc ftp://--:---;ftpcontent.mediapps.com/francais/journal/eco/*.jpg You've got 2 '='s after glob, and your shell is probably eating the special characters. Escape/quote them. Max.
Re: Can't use wildcard
ROMNEY Jean-Francois [EMAIL PROTECTED] wrote: There is the same error with only one = What do you meen with Escape/quote ? Read the documentation for your shell. wget never sees the * beacuse the shell has already globbed with it. Also, please keep discussions on-list. Max.
Re: Recursive download Problem.
Yun MO wrote: Dear Ma'am/sir, I could not get all files with "wget -r" command for following address. Would you help me? Thank you in advance. M.Y. --- meta NAME="robots" CONTENT="noindex,nofollow" Wget is obeying the robots instruction. wget -e robots=off ... will override. Max.
Re: wget tries to print the file prn.html
Thomas Lussnig wrote: this is an Windows spezific problem. Normaly should prn.html an valid Filename. And as you can check Long filenames can contain :. No they can't. And, on NTFS including a : in a filename causes the data to be written into an invisible named stream. But there is an List on reserved words that where interepreted as spezial devices. Since wget work on a ton of systems it could maybe very troublesome to find you each reserved filename. Since you could exclude filenames spezialy i think it is not that problem. Example's - Windows NULL: PRN: - Linux (Normaly,Many) /dev/* /proc/* Cu Thomas Lußnig p.s. Shure it's prn.html and not prn:html ? prn.html will cause the problem. Max.
Re: wget
[EMAIL PROTECTED] wrote: We're using the wget app to run our scheduled tasks. Each time it runs, a copy of the file is created with a number added to the end of it. Is there a way to turn this off? We tried adding --quiet to the bat file but it still wrote the file. -nc or -N depending on what you are trying to do. Probably -N. Max.
Re: wget and asp
Try the source I sent you. Dominique wrote: thank you Max, np Is it different than the one I CVS-ed yesterday? I mean, does it have changes in creating filenames? Please note, that I finally compiled it and could run it. No changes... I did run autoconf, so you could go straight to configure (as you have too new an autoconf version). CVS = a lot of breaks during checking out What breaks? It was freezing during check out - something I never had yet using CVS. I had to ctrl-c and restart a few times. Weird... I had no such problems - but I was only running a 'cvs upd' having checked it out before. Something has just occurred to me: by default, wget defaults to recursion resticted to 5 levels. Perhaps that is the problem? If so, an -l0 will fix it. If not, could you have another go at describing it? Max.
Re: wget and asp
Max Bowsher wrote: The problem is that with a ?x=y, where y contains slashes, wget passes them unchanged to the OS, causing directories to be created, but fails to adjyst relative links to account for the fact that the page is in a deeper directory that it should be. The solution is to map / to _ or something. Thomas Lussnig wrote: a) The OS do not automaticly create directorys, this have wget todo Looking back at my last email, I think Did I really say that?!. That part is, of course nonsense. However, the directories are created regardless. b) The idee to create directorys even for parameter is not wrong !!! This is only an example (not working) but you can see why directorys where chosen insteed of _ http://download.com/get.php?filename=/windows/putty.exe http://download.com/get.php?filename=/linux/putty.tgz I don't understand your example, but regardless, wget 1.9-beta from cvs URL-encodes slashes in a query string. Dominique: If I understand your problem correctly, then wget 1.9 has solved it. Max.
Suggestion: Anonymous rsync access to the wget CVS tree.
As a dial-up user, I find it extremely useful to have access to the full range of cvs functionality whilst offline. Some other projects provide read-only rsync access to the CVS repository, which allows a local copy of the repository to be made, not just a checkout of a particular version. Since access to xemacs cvs on sunsite.dk is already provided in this manner, perhaps it would be possible for wget, as well? Thanks, Max.
Re: wget and asp
Dominique wrote: tryit_edit.asp?filename=tryhtml_basicreferer=http://www.w3schools.com/html/html _examples.asp and just this one is truncated. I think some regexp or pattern or explicit list of where_not_to_break_a_string characters would solve the problem. Or maybe it is already possible, but I dont know how? I think that some URL encoding has not happened somewhere. Whether wget or the web server is at fault, I don't know, but the solution would be to URL encode the slashes. Max.
Re: WGET and the robots.txt file...
-e robots=off Jon W. Backstrom wrote: Dear Gnu Developers, We just ran into a situation where we had to spider a site of our own on a outsourced service because the company was going out of business. Because wget respects the robots.txt file, however, we could not get an archive made until we had the outsourced company delete their robots.txt file on the server in the last 2 days of service. This might not have had such a happy ending, however. Would you consider an --ignore-robotfile option perhaps, or would that be too abusive? I know I can always edit the source and make my own, but I wondered if this was something that WGET might want to do in a release version or is the potential for abuse too great? Thank you! Jon Backstrom [EMAIL PROTECTED]
Re: wget and asp
Thomas Lussnig wrote: Why should be there an url encoding ? / are an legal character in url and in the GET string. Ist used for example for Path2Query translation. The main problem is that wget need to translate an URL to and Filesystem name. Yes, you are right, I wasn't think clearly. Filesystem names are PATH and FILE names. And wget do it right i think. example: http://my.domain/dyn_page.sql/content_id/1891/session/0815 Server: File: /dyn_page.sql Query String /content_id/1891/session/0815 Client: 0. dyn_page.sql/content_id/1891/session/0815(current i think) 1. dyn_page.sql_content_id_1891_session_0815 2. 0815 Only the Author of the webpage could tell you what is an good translation from an URL to filesystem if there is an querystring on the page, else ALL solutions have their bad sites !!! ?? The problem is that with a ?x=y, where y contains slashes, wget passes them unchanged to the OS, causing directories to be created, but fails to adjyst relative links to account for the fact that the page is in a deeper directory that it should be. The solution is to map / to _ or something. Max.
Re: wget and asp
You don't give a whole lot of information. It's kind of impossible to help when you don't know what the problem is. Posting the URL of the problem site would be a good idea. Max. Dominique wrote: Is it possible at all? dominique Dominique wrote: Hi, I have a problem trying to wget a site for off-line usage. The site is made in asp and uses lots of stuff like: xxx.asp?filename=yyy when I download a sample asp page it seems to be almost empty and have no links!!! I dont understand... I tried mirroring, html extensions and all I could find relevant in the man page. Would someone please help me solve this? thank you dominique
Re: getting the correct links
Christopher Stone wrote: Thank you all. Now the issue seems to be that it only gets the root directory. I ran 'wget -km -nd http://www.mywebsite.com -r Max.
Re: getting the correct links
Jens Rösner wrote: Hi! Max' hint is incorrect I think, as -m includes -N (timestamps) and -r (recursive) Ooops, you're right. I tend not to use -m much myself. I should pay more attention! Max.
Re: getting the correct links
Christopher Stone wrote: When I ran wget and sucked the site to my local box, it pulled all the pages down and the index page comes up fine, but when I click on a link, it goes back to the remote server. What switch(s) do I use, so that when I pull the pages to my box, that all of the links are changed also? -k Also look at -K and -E. Max.
Re: ? about Servlets
Vernon, Clayton wrote: I'm having all kinds of difficulties downloading from servlets. Here is a script for a URL that works in a browser to donload a ZIP file but doesn't work via Wget http://oasis.caiso.com/servlet/SingleZip?nresultformat=5squeryname=SLD_LOAD _MWsxslname=SLD_LOAD_MWdstartdate=20010810denddate=20010810sload_type='A SL','SSL'ssched_class=N but wget http://oasis.caiso.com/servlet/SingleZip?nresultformat=5squeryname=SLD_LOAD _MWsxslname=SLD_LOAD_MWdstartdate=20010810denddate=20010810sload_type='A SL','SSL'ssched_class=N doesn't work- gives me an error back from the destination server saying I didn't provide parameters which I did, in fact, provide. I am behind a proxy server, by the way My more straightforward downloads are working fine. Thanks for any help you might provide. It is possible that your shell is interpreting certain characters, so that what you type is not what wget sees. You might try some form of quoting (difficult, since your URL contains single quotes), or placing the url in a file and using wget -i file. Max.
Re: not downloading everything with --mirror
HTTP does not provide a dirlist command, so wget parses html to find other files it should download. Note: HTML not XML. I suspect that is the problem. Max. Funk Gabor wrote: I recently found that during a (wget) mirror, not all the files are downloaded. (wget v1.8.2 / debian) For example: wget --mirror http://www.jeannette.hu downloads some files, but for example ./saj_elemei will only contain filelist.xml (with the following content). xml xmlns:o=urn:schemas-microsoft-com:office:office o:MainFile HRef=../saj.htm/ o:File HRef=image001.jpg/ o:File HRef=image002.gif/ o:File HRef=image003.gif/ o:File HRef=image004.gif/ o:File HRef=filelist.xml/ /xml However, if I issue a wget [--no-parent] --mirror http://www.jeannette.hu/saj_elemei then the following will also gets downloaded. -rw-r--r--1 root root 257 Oct 29 2001 filelist.xml -rw-r--r--1 root root 2506 Oct 29 2001 image001.jpg -rw-r--r--1 root root23343 Oct 29 2001 image001.png -rw-r--r--1 root root 4959 Oct 29 2001 image002.gif -rw-r--r--1 root root 1053 Oct 29 2001 image003.gif -rw-r--r--1 root root 4246 Oct 29 2001 image004.gif -rw-r--r--1 root root27068 Oct 29 2001 image004.wmz -rw-r--r--1 root root17627 Oct 29 2001 image006.gif -rw-r--r--1 root root 1447 Aug 15 16:33 index.html -rw-r--r--1 root root 1447 Aug 15 16:33 index.html?D=A -rw-r--r--1 root root 1447 Aug 15 16:33 index.html?D=D -rw-r--r--1 root root 1447 Aug 15 16:33 index.html?M=A -rw-r--r--1 root root 1447 Aug 15 16:33 index.html?M=D -rw-r--r--1 root root 1447 Aug 15 16:33 index.html?N=A -rw-r--r--1 root root 1447 Aug 15 16:33 index.html?N=D -rw-r--r--1 root root 1447 Aug 15 16:33 index.html?S=A -rw-r--r--1 root root 1447 Aug 15 16:33 index.html?S=D My goal is to have the most files (eg: full retreive) of a site (with possibly using one command only...). I tried several other ftpmirroring program but they're racing for the crappiest program on earth title against each other. Is it wget's fault, or am I the dumb one and missed something somewhere? Thanks, Gabor
Wget Bug: Re: not downloading everything with --mirror
Funk Gabor wrote: HTTP does not provide a dirlist command, so wget parses html to find other files it should download. Note: HTML not XML. I suspect that is the problem. If wget wouldn't download the rest, I'd say that too. But 1st the dir gets created, the xml is dloaded (in some other directory some *.gif too) so wget senses the directory. If I issue the wget -m site/dir then all of the rest comes down, (index.html?D=A and others too) so wget is able to get everything but not at once. So there would be no technical limitation for wget to make it happen in one step. So it is either a missing feature (shall I say, a bug as wget can't do the mirror which it could've) or I was unable to find some switch which makes it happen at once. Hmm, now I see. The vast majority of websites are configured to deny directory viewing. That is probably why wget doesn't bother to try, except for the directory specified as the root of the download. I don't think there is any option to do this for all directories, because its not really needed. The _real_ bug is that wget is failing to parse what look like valid img ... src=... ... tags. Perhaps someone more familiar with wget's html parsing code could investigate? The command is: wget -r -l0 www.jeannette.hu/saj.htm and ignored files are a number of image files. Max.
Suggestion: Anonymous rsync access to the CVS tree.
As a dial-up user, I find it extremely useful to have access to the full range of cvs functionality whilst offline. Some other projects provide read-only rsync access to the CVS repository, which allows a local copy of the repository to be made, not just a checkout of a particular version. Since access to xemacs cvs on sunsite.dk is already provided in this manner, perhaps it would be possible for wget, as well? Thankyou, Max.
Re: [BUG] assert test msecs
Hartwig, Thomas wrote: I got a assert exit of wget in retr.c in the function calc_rate because msecs is 0 or lesser than 0 (in spare cases). I don't know how perhaps because I have a big line to the server or the wrong OS. To get worked with this I patched retr.c setting msecs = 1 if equal or below zero. Some informations are added below, what else do you need? #: cat /proc/version Linux version 2.4.18 (root@netbrain) (gcc version 2.96 2731 (Red Hat Linux 7.3 2.96-110)) #4 Sun Jul 28 09:01:06 CEST 2002 I have run across this problem too. It is because with Linux 2.4.18 (and other versions??) in certain circumstances, gettimeofday() is broken and will jump backwards. See http://kt.zork.net/kernel-traffic/kt20020708_174.html#1. Is there any particular reason for this assert? If there is, maybe: if (msecs 0) msecs = 0; would be more suitable. Max.