Re: Another case that cause wget to crash
Xuehua Shen wrote: Hi,there, Another Crash case of wget. Note thatthe latest wget version (1.8.2) doesn't segfault on this. when I use wget http://www.usmint.gov/what.cfm. In the future, when reporting problem please include the --debug output of the wget command. Resolving www.usmint.gov...done Connecting to www.usmint.gov[208.45.143.104]80...connected Location:http:\\catalog.usmint.gov[following] http:\\catalog.usmint.gov:Unsupported scheme. Wget nicely informs you that it does not support the http:\\ scheme (or is it http:). Segmentation fault(core dump). Wget shouldn't segfault on this, and as said before this is fixed in wget 1.8.2. I think there are some problems when wget deals with the redirection. More likely this the redirection itself, if the webmaster intended to redirect to http://catalog.usmint.gov (using a schemme (http://) supported by wget) it should say so. Instead of inventing a new scheme not supported by any web-clients. If you would like to see support for this new scheme please provide links to RFCs and references to software already implementing it. Of course patches will of be considered 8-) Regards. Xuehua -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: pftp mode for wget?
Joshua N Pritikin wrote: Does wget support passive ftp (pftp)? i have wget 1.8.1-4 (debian i386). Then look at --passive-ftp: $ wget --help GNU Wget 1.8.1, a non-interactive network retriever. [...] FTP options: -nr, --dont-remove-listing don't remove `.listing' files. -g, --glob=on/off turn file name globbing on or off. --passive-ftp use the passive transfer mode. --retr-symlinks when recursing, get linked-to files (not dirs). [...] -- Victory to the Divine Mother!! after all, http://sahajayoga.org http://why-compete.org -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: dynamic calendar pages
Stan Reeves wrote: I'm having trouble when recursively downloading sites with dynamic calendar pages. It can take *forever* to get through several levels before hitting the recursion level limit. I can reject pages with a .pl or .asp extension, but they're apparently still downloaded and scanned for links before being removed. Is there a solution to this? I'm using v. 1.8.1. This is be design. The reject/accept options after scanning a text/html page for links, which seems ot to be what most people expect deemed by the mailing list complains about this. And I don't think there's anyway to work around it. Stan Reeves Electrical and Computer Engineering Dept. Auburn University Auburn, AL 36849 [EMAIL PROTECTED] http://www.eng.auburn.edu/~sjreeves -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: wget and meta name=robots content=noindex,nofollow
Cédric Rosa wrote: Hello, Is-it normal that wget saves web pages which contain meta name=robots content=noindex ? Or does wget considerate that it is not a search engine and respects only the follow/nofollow rules ? Or is-it a bug ? :) I don't think wget support meta name=robots tags. Robot support was added to wget long before these tags where proposed. Thanks. Cedric. -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: hi
mp3TEAM wrote: My question is stupid, but i need HELP. When i connect with telnet to my server and i try to grab some link with WGET and if in the link has ( or ' or . syntax error near unexpected token . How to find SUBSTITUTE for these symbol? This character have some special meaning for your shell (it's a good idea to become familiar with them), you can usually protect them by either quoting the URL like http:/ or or using a backslash before the special character like http://host/cgi\?id=\'test\' I do not speak ENGLISH very well PLS excuse me !! http://www.MP3-BG.com -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: Honesly, wget as a webcrawler?
Jason Davis wrote: I'm trying to find the best efficient solution for mirroring, spidering and/or crawlering (however I need to put this) of hunderds of thousdands of websites. a solution that can handle literally millions of files. I've read that wget gets delayed on incremental mirroing of huge sites and I wonder if that's true. if so, is fwget (http://bay4.de/FWget/) can be a solution? or there is totally diffrent place I should look into? As the page says FWget is WGet with hastables. Since version 1.7 has wget used hashtables. Kalium: do you have anything to add to this. If not would you mind to add a note about wget now using hashtables internally? I appriciate your help and would love to hear any tip! Some things to think about are: - can you install software on the server e.g. rsync - does the server offer the sames files via a service better suited for mirroring than HTTP - do you access different webservers (wget only uses one connection) - are the servers load balanced please keep me CC:d on the replies as I wasn't able to subscribe myself.. Thanks! -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: Bug with wget ? I need help.
Cédric Rosa wrote: Hello, First, scuse my english but I'm french. When I try with wget (v 1.8.1) to download an url which is behind a router, the software wait for ever even if I've specified a timeout. With ethereal, I've seen that there is no response from the server (ACK never appears). This a documented behavior, because of programming issues the timeout does not cover the connection but only response after a connection has been established. For version 1.9 the timeout option will also cover the connection. http://cvs.sunsite.dk/viewcvs.cgi/*checkout*/wget/NEWS?rev=HEADcontent-type=text/plain Here is the debug output: rosa@r1:~/htmlparser1.1/lib$ wget www.sosi.cnrs.fr --16:30:54-- http://www.sosi.cnrs.fr/ = `index.html' Resolving www.sosi.cnrs.fr... done. Connecting to www.sosi.cnrs.fr[193.55.87.37]:80... Thanks by advance for your help. Cedric Rosa. -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: interesting bug
[EMAIL PROTECTED] wrote: I was using wget to suck a website, and found an interesting problem some of the URLs it found contained a question mark, after which it responded with cannot write to '... insert file/URL here?more text ...' (invalid argument). And - it didn't save any of those URLs to files (on my NTFS/windows XP machine) ... It may also have said Illegal filename. Note that not all characters are allowed in Windows filenames, among them '?'. As '?' is quite common in data driven web-sites most Windows binaries have included a patch to deal with it. The latest wget release 1.8.2 includes now such a patch. But the rest of illegal characters are not deal with, nor is other special windows features. what can I do in order to spider/crawl these pages and save them to my local disk ? Use wget version 1.8.2 Alex -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: HTTP /1.1 500 Internal Server Error
Mark Bucciarelli wrote: I am having trouble wgetting a samsung printer driver from their site. Every time I try, I immediately get an HTTP/1.1 500 Internal Server Error. The web browser initiates the download properly when I click on the link from the referer page. Here is the command I am running (I don't have a .wgetrc): wget --debug --referer=http://www.samsungelectronics.com/printer/support/downloads/400329_844_file4.html; http://211.45.27.253/servlet/Downloader?path=%2Fprinter%2Fsupport%2Fdownloads%2Fattach_file%2F20020516175051spp-1.0.2.i386.tar.gzamp;realname=spp-1.0.2.i386.tar.gz; and here is the debug output: debug output skipped/ This seems to be yet another encoding problem. I have no problem if I change the 'amp;' to ''. IIRC URLs found in a HTML page should be HTML decoded. A simple test (wget -F -i URL.html) shows that wget does this. But I'm not sure wget should do it for URLs on the cmd line or in a non-HTML file. In the past we had a lot of problems with wget being overzealously {en|de}coding URLs. $ wget http://211.45.27.253/servlet/Downloader?path=%2Fprinter%2Fsupport%2Fdownloads%2Fattach_file%2F20020516175051spp-1.0.2.i386.tar.gzrealname=spp-1.0.2.i386.tar.gz; --15:20:35-- http://211.45.27.253/servlet/Downloader?path=%2Fprinter%2Fsupport%2Fdownloads%2Fattach_file%2F20020516175051spp-1.0.2.i386.tar.gzrealname=spp-1.0.2.i386.tar.gz = `Downloader@path=%2Fprinter%2Fsupport%2Fdownloads%2Fattach_file%2F20020516175051spp-1.0.2.i386.tar.gzrealname=spp-1.0.2.i386.tar.gz' Connecting to 211.45.27.253:80... connected. HTTP request sent, awaiting response... 200 OK Length: 28,864,218 [application/octet-stream] Last-modified header missing -- time-stamps turned off. --15:20:36-- http://211.45.27.253/servlet/Downloader?path=%2Fprinter%2Fsupport%2Fdownloads%2Fattach_file%2F20020516175051spp-1.0.2.i386.tar.gzrealname=spp-1.0.2.i386.tar.gz = `Downloader@path=%2Fprinter%2Fsupport%2Fdownloads%2Fattach_file%2F20020516175051spp-1.0.2.i386.tar.gzrealname=spp-1.0.2.i386.tar.gz' Connecting to 211.45.27.253:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [application/octet-stream] [ = ] 1,257,472 25.53K/s Thanks for a great tool! And thank you for reading the instructions and actually including debug output ! Mark -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: 1.8.2 branch opened
Hrvoje Niksic wrote: Since we need to have a release because of the OpenSSL legalese, we can as well fix the most important (crashing) bugs in 1.8.1. I have opened a branch named `branch-1_8_2' where the 1.8.2-specific changes will be applied. Note that only bug fixes will be accepted for 1.8.2. No new features. Here are the patches that I plan to apply initially. Please let me know if you have more. It seems you missed one of the || SCHEME_HTTPS patches: Date: Mon, 11 Feb 2002 21:24:44 +0100 From: Christian Lackas [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: patch: recursive downloading for https Message-ID: [EMAIL PROTECTED] 2002-02-11 Christian Lackas [EMAIL PROTECTED] * recurive downloading for https fixed Index: src/recur.c === RCS file: /pack/anoncvs/wget/src/recur.c,v retrieving revision 1.41 diff -u -r1.41 recur.c --- src/recur.c 2001/12/19 14:27:29 1.41 +++ src/recur.c 2002/02/11 20:15:54 @@ -438,6 +438,9 @@ /* 1. Schemes other than HTTP are normally not recursed into. */ if (u-scheme != SCHEME_HTTP +#ifdef HAVE_SSL +u-scheme != SCHEME_HTTPS +#endif !(u-scheme == SCHEME_FTP opt.follow_ftp)) { DEBUGP ((Not following non-HTTP schemes.\n)); @@ -446,7 +449,11 @@ /* 2. If it is an absolute link and they are not followed, throw it out. */ - if (u-scheme == SCHEME_HTTP) + if (u-scheme == SCHEME_HTTP +#ifdef HAVE_SSL + || u-scheme == SCHEME_HTTPS +#endif +) if (opt.relative_only !upos-link_relative_p) { DEBUGP ((It doesn't really look like a relative link.\n)); @@ -534,7 +541,12 @@ } /* 8. */ - if (opt.use_robots u-scheme == SCHEME_HTTP) + if (opt.use_robots (u-scheme == SCHEME_HTTP +#ifdef HAVE_SSL + || u-scheme == SCHEME_HTTPS +#endif + ) +) { struct robot_specs *specs = res_get_specs (u-host, u-port); if (!specs) -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: OK, time to moderate this list
Ian Abbott wrote: On 22 Mar 2002 at 4:08, Hrvoje Niksic wrote: The suggestion of having more than one admin is good, as long as there are people who volunteer to do it besides me. I'd volunteer too, but don't want to be the only person moderating the lists for the same reasons as yourself. (I'm also completely clueless about the process of moderating mailing lists at the moment!) I'll volunteer too, so now we have 4 moderators but all based in Europe (If I've counted right: de, se, uk and dk). A couple of moderators from other timezones (like America, Asia or Australia) would be nice. I also have to check with the sunsite.dk people whether the ML manager, ezmlm, can handle this. If it only handles a single moderator account, perhaps a secure web-based email account could be set up for moderation purposes which the real moderators could log into on a regular basis. Now that we are talking about changing the ml configuration. Some other things I would like to have changed too: - The current setup removes the original mail's headers. Change it so that the original headers are preserved. - Add a header with the receiver's subscribed email so {s,}he can unsubscribe {her,him}self. The cygwin ml also running ezmlm adds this: List-Unsubscribe: mailto:[EMAIL PROTECTED] List-Subscribe: mailto:[EMAIL PROTECTED] List-Archive: http://sources.redhat.com/ml/cygwin/ List-Post: mailto:[EMAIL PROTECTED] List-Help: mailto:[EMAIL PROTECTED], http://sources.redhat.com/ml/#faqs If we switch to mailman can it be configured to not send a password reminder every month? I unsubscribed from a really low-traffic list on sunsite.dk just because of this. -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: wget core dump with recursive file transfer
Paul Eggert wrote: (I built wget on Solaris 8 with GCC 3.0.3.) Here are the symptoms of the problem. 184-shade $ wget --recursive file:/// Segmentation Fault (core dumped) Note that file:// is not supported. $ wget -d file:// DEBUG output created by Wget 1.8.1 on cygwin. file://: Unsupported scheme. But the core dump is not limited to Solaris when combining --recursive with file:// $ wget -d --recursive file:// DEBUG output created by Wget 1.8.1 on cygwin. Segmentation fault (core dumped) And this is not fixed in the current CVS code $ wget-dev -d --recursive file:// DEBUG output created by Wget 1.8.1+cvs on cygwin. Segmentation fault (core dumped) A patch like this (would one of the C coder on the list check it?) seems to fix it. I suppose the FINISHED and Downloaded pat should be removed too, to make it more clear that it ended with error. $ ./wget-dev.exe -d --recursive file:// DEBUG output created by Wget 1.8.1+cvs on cygwin. file://: Unsupported scheme. FINISHED --01:12:30-- Downloaded: 0 bytes in 0 files Index: src/recur.c === RCS file: /pack/anoncvs/wget/src/recur.c,v retrieving revision 1.41 diff -u -r1.41 recur.c --- src/recur.c 2001/12/19 14:27:29 1.41 +++ src/recur.c 2002/02/17 00:15:25 @@ -184,6 +184,7 @@ retrieve_tree (const char *start_url) retrieve_tree (const char *start_url) { uerr_t status = RETROK; + int url_error_code; /* url parse error code */ /* The queue of URLs we need to load. */ struct url_queue *queue = url_queue_new (); @@ -194,7 +195,14 @@ /* We'll need various components of this, so better get it over with now. */ - struct url *start_url_parsed = url_parse (start_url, NULL); + struct url *start_url_parsed = url_parse (start_url, url_error_code); + if (!start_url_parsed) +{ + logprintf (LOG_NOTQUIET, %s: %s.\n, start_url, url_error (url_error_code)); + xfree (start_url); + return URLERROR; +} + /* Enqueue the starting URL. Use start_url_parsed-url rather than just URL so we enqueue the canonical form of the URL. */ 185-shade $ wget --version GNU Wget 1.8.1 Copyright (C) 1995, 1996, 1997, 1998, 2000, 2001 Free Software Foundation, Inc. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. Originally written by Hrvoje Niksic [EMAIL PROTECTED]. 186-shade $ uname -a SunOS shade.twinsun.com 5.8 Generic_108528-13 sun4u sparc SUNW,Ultra-1 187-shade $ gdb /opt/reb/bin/wget core GNU gdb 5.1.1 Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as sparc-sun-solaris2.8... Core was generated by `wget --recursive file:///'. Program terminated with signal 11, Segmentation fault. Reading symbols from /usr/lib/libmd5.so.1...done. Loaded symbols for /usr/lib/libmd5.so.1 Reading symbols from /opt/reb/lib/libssl.so.0.9.6...done. Loaded symbols for /opt/reb/lib/libssl.so.0.9.6 Reading symbols from /opt/reb/lib/libcrypto.so.0.9.6...done. Loaded symbols for /opt/reb/lib/libcrypto.so.0.9.6 Reading symbols from /usr/lib/libdl.so.1...done. Loaded symbols for /usr/lib/libdl.so.1 Reading symbols from /usr/lib/libsocket.so.1...done. Loaded symbols for /usr/lib/libsocket.so.1 Reading symbols from /usr/lib/libnsl.so.1...done. Loaded symbols for /usr/lib/libnsl.so.1 Reading symbols from /usr/lib/libc.so.1...done. Loaded symbols for /usr/lib/libc.so.1 Reading symbols from /usr/lib/libmp.so.2...done. Loaded symbols for /usr/lib/libmp.so.2 Reading symbols from /usr/platform/SUNW,Ultra-1/lib/libc_psr.so.1...done. Loaded symbols for /usr/platform/SUNW,Ultra-1/lib/libc_psr.so.1 #0 0x0002a698 in retrieve_tree (start_url=0x4fb98 file:///) at recur.c:201 201 recur.c: No such file or directory. in recur.c (gdb) where #0 0x0002a698 in retrieve_tree (start_url=0x4fb98 file:///) at recur.c:201 #1 0x0002832c in main (argc=-4264136, argv=0xffbef054) at main.c:812 -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: wget crash
Steven Enderle wrote: short and dirty (in german): Größen stimmen nicht überein (lokal 6968552) -- erneuter Download. (sizes do not match) ... (retrieving) --00:08:03-- ftp://ftp.scene.org/pub/music/artists/nutcase/mp3/timeofourlives.mp3 = `ftp.scene.org/pub/music/artists/nutcase/mp3/timeofourlives.mp3' == CWD nicht erforderlich. == PORT ... fertig.== REST 6968552 ... fertig. == RETR timeofourlives.mp3 ... fertig. Länge: 5,574,867 [noch -1,393,685] The already downloaded file is bigger than the (6,968,552) than the file to download (5,574,867). assertion percentage = 100 failed: file progress.c, line 552 zsh: abort (core dumped) wget -m -c --tries=0 ftp://ftp.scene.org/pub/music/artists/nutcase/mp3/timeofourlives.mp3 progress.c int percentage = (int)(100.0 * size / bp-total_length); assert (percentage = 100); Of course the assert will fail, size is bigger than total_length ! hope this helps in any way. Yes, it did after I actually read it 8-) To reproduce with wget-1.8.1 $ wget ftp://sunsite.dk/disk1/gnu/wget/wget-1.8{,.1}.tar.gz $ cat wget-1.8.tar.gz wget-1.8.1.tar.gz $ wget -d -c ftp://sunsite.dk/disk1/gnu/wget/wget-1.8.1.tar.gz DEBUG output created by Wget 1.8.1 on cygwin. Using `.listing' as listing tmp file. --13:48:44-- ftp://sunsite.dk/disk1/gnu/wget/wget-1.8.1.tar.gz = `.listing' Resolving sunsite.dk... done. Caching sunsite.dk = 130.225.247.90 Connecting to sunsite.dk[130.225.247.90]:21... connected. Created socket 3. Releasing 0x100b07b8 (new refcount 1). Logging in as anonymous ... 220 ProFTPD 1.2.4 Server (SunSITE Denmark FTP-Server) [sunsite-int.sunsite.dk] -- USER anonymous 331 Anonymous login ok, send your complete email address as your password. -- PASS -wget@ 230- Welcome to SunSITE.dk SunSITE.dk is located at Aalborg University, Denmark. It is a Sun Enterprise E3500 Server with 2 400MHz UltraSPARC-II CPUs, 2 GB Memory and 563 GB raw storage capacity. and 563 GB raw storage capacity. The server was kindly donated by Sun Microsystems. Aalborg University, SuSE GmbH, 3Com Nordic, Silcon Group, CLARiiON, FourLeaf Technologies and Infoseek are sponsoring the project. More information on SunSITE.dk can be found at http://sunsite.dk/SunSITE/ Note that if ftp hangs or dies, try putting a hyphen at the start of your password. All transfers are logged and any misuse will be acted upon. Please email suggestions and questions to [EMAIL PROTECTED] 230 Anonymous access granted, restrictions apply. Logged in! == SYST ... -- SYST 215 UNIX Type: L8 done.== PWD ... -- PWD 257 / is current directory. done. == TYPE I ... -- TYPE I 200 Type set to I. done. changing working directory Prepended initial PWD to relative path: old: 'disk1/gnu/wget' new: '/disk1/gnu/wget' == CWD /disk1/gnu/wget ... -- CWD /disk1/gnu/wget 250 CWD command successful. done. == PORT ... Master socket fd 4 bound. -- PORT 192,168,1,131,13,205 200 PORT command successful. done.== LIST ... -- LIST 150 Opening ASCII mode data connection for file list. done. Created socket fd 5. [ = ] 514 3.35K/s Closing fd 5 Closing fd 4 226 Transfer complete. 13:48:45 (3.35 KB/s) - `.listing' saved [514] PLAINFILE; perms 644; month: Sep; day: 23; year: 1998 (no tm); PLAINFILE; perms 644; month: Dec; day: 31; year: 2000 (no tm); PLAINFILE; perms 644; month: Dec; day: 31; year: 2000 (no tm); PLAINFILE; perms 644; month: Nov; day: 18; time: 15:43:00 (no yr); PLAINFILE; perms 644; month: Jun; day: 4; year: 2001 (no tm); PLAINFILE; perms 644; month: Dec; day: 25; time: 21:04:00 (no yr); PLAINFILE; perms 644; month: Dec; day: 10; time: 08:00:00 (no yr); Removed `.listing'. The sizes do not match (local 2185627) -- retrieving. --13:48:45-- ftp://sunsite.dk/disk1/gnu/wget/wget-1.8.1.tar.gz = `wget-1.8.1.tar.gz' == CWD not required. == PORT ... Master socket fd 4 bound. -- PORT 192,168,1,131,13,224 200 PORT command successful. done.== REST 2185627 ... -- REST 2185627 350 Restarting at 2185627. Send STORE or RETRIEVE to initiate transfer. done. == RETR wget-1.8.1.tar.gz ... -- RETR wget-1.8.1.tar.gz 150 Opening BINARY mode data connection for wget-1.8.1.tar.gz (4293879449 bytes). done. Lying FTP server found, adjusting. Created socket fd 5. Length: 1,097,780 [-1,087,847 to go] assertion percentage = 100 failed: file /home/hack/projects/cygwin-wget/wget-1.8.1/src/progress.c, line 552 Aborted (core dumped) Thanks Steven -- -- - Steven Enderle - m d n Huebner GmbH - [EMAIL PROTECTED] - + 49 911 93 90 90 - - Digital Imaging Documentmanagment - -- -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: Debian bug 21588 - inconsistent naming of directories created by wget
Guillaume Morin wrote: Forward of http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=21588repeatmerged=yes If I access a server not on the default port, wget does not write that port in the name of the directory it creates. Here is an example: --13:43:40-- http://www.center.osaka-u.ac.jp:7080/center/contents.html = `www.center.osaka-u.ac.jp/center/contents.html' This was changed with version 1.8. Now it will be saved under www.center.osaka-u.ac.jp:7080. $ wget -l inf -r http://www.wsu.edu:8080/~brians/errors/errors.html --19:24:23-- http://www.wsu.edu:8080/%7Ebrians/errors/errors.html = `www.wsu.edu:8080/%7Ebrians/errors/errors.html' Resolving www.wsu.edu... done. Connecting to www.wsu.edu[134.121.1.61]:8080... connected. HTTP request sent, awaiting response... 200 OK Length: 40,575 [text/html] 100%[] 40,57522.62K/s ETA 00:00 There can still be directory collusions, but now only for different services on the same host all on their default port (or http and https on the same non-default port) i.e. ftp://host, http://host and https://host will all be saved under host. Please keep ,[EMAIL PROTECTED] CC'ed -- Guillaume Morin [EMAIL PROTECTED] Oh, that is nice out there, I think I'll stay for a while (RHCP) -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: hola saludos
Please in the future write in english if you expect to get any help. En el futuro por favor escriba en inglés si quire recibir ayuda. maromans wrote: estoy utilizando este programa y no puedo bajar sitios enteros desconosco si estoy equivocado en las opciones o modificadores pero no me baja los hipervinculos de los sitios que intento grabar si pueden ayudarme sinceramente gracias Ademas de escribir en ingles y hasta que hallamos perfeccionado el modulo der leer mentes, prodrias empezar por detallar que es lo que haces, que esperabas que ocurriese (y por que) y que es lo que realmente ocurre. How To Ask Questions The Smart Way: http://www.tuxedo.org/~esr/faqs/smart-questions.html _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: Dynamic Images
RUI SHANTILAL wrote: Don´t know if you can consider it as a bug but reality is that I couldn´t retrieve images that are generated through a script. Example is : http://www.portugaldiario.iol.pt/idpd92.html try to retrieve all this page including all the images and all the images called as : img src=http://www.iol.pt/intermedia/iol/mediaget/get_iol_image/?ord_procedure_path=17441nome_tabela=imagens_sitemakernome_campo=imagemcondicao=id border=0 .. are not saved using wget !! Most people think of it as a feature. Since most externally linked images are banners they are glad it's not downloaded -- right troubleshooting, wrong conclusion: it's not the dynamic generation (banners again?) wget doesn't like but the different host (www.portugaldiario.iol.pt != www.iol.pt) Hope u ppl get this funcionality working in next version !! Keep the good work !! It's already there tell wget to span host (--span-hosts). But use it with care as it also would follow links to banners ... smiler -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: Possible bugs when making https requests
Sacha Mallais wrote: I'm having some problems with wget and SSL. I have been getting the following output on occasion (meaning, the exact same command works sometimes and sometimes produces this), even when everything else (my web browser, etc) is able to connect with no problem: -- DEBUG output created by Wget 1.8.1 on aix4.1.5.0. --14:30:01-- https://tpurs.oda.state.or.us/ = `/tmp/tPURS-apache-AYA-wget.output' Resolving tpurs.oda.state.or.us... done. Caching tpurs.oda.state.or.us = 192.152.7.27 Connecting to tpurs.oda.state.or.us[192.152.7.27]:443... connected. Created socket 5. Releasing 20026168 (new refcount 1). Unable to establish SSL connection. Closing fd 5 Unable to establish SSL connection. -- No problem here but then it's only version 1.7.1 $ wget -d https://tpurs.oda.state.or.us/ DEBUG output created by Wget 1.7.1 on cygwin. parseurl (https://tpurs.oda.state.or.us/;) - host tpurs.oda.state.or.us - opath - dir - file - ndir newpath: / --23:50:07-- https://tpurs.oda.state.or.us/ = `index.html' Connecting to tpurs.oda.state.or.us:443... Caching tpurs.oda.state.or.us - 192.152.7.27 Created fd 3. connected! ---request begin--- GET / HTTP/1.0 User-Agent: Wget/1.7.1 Host: tpurs.oda.state.or.us Accept: */* Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... HTTP/1.1 200 OK Date: Wed, 23 Jan 2002 22:50:29 GMT Server: Apache/1.3.22 (Darwin) mod_ssl/2.8.5 OpenSSL/0.9.6b Cache-Control: max-age=60 Expires: Wed, 23 Jan 2002 22:51:29 GMT Last-Modified: Sat, 12 Jan 2002 00:30:02 GMT ETag: d78f1-100-3c3f838a Accept-Ranges: bytes Content-Length: 256 Keep-Alive: timeout=15, max=500 Connection: Keep-Alive Content-Type: text/html Found tpurs.oda.state.or.us in host_name_address_map: 192.152.7.27 Registered fd 3 for persistent reuse. Length: 256 [text/html] 0K 100% @ 250.00 KB/s 23:50:09 (250.00 KB/s) - `index.html' saved [256/256] Also note the it does _not_ appear to be retrying the connection. I have explicitly set --tries=5, and with a non-ssl connection, the above stuff appears 5 times when it cannot connect. But, for SSL stuff, one failure kills the process. If there is any other info I can give you, let me know. You have already done the exceptional: providing debug output ! sacha -- Sacha Michel Mallais [EMAIL PROTECTED] Global Village Consulting Inc. http://www.global-village.net/sacha Things won are done; joy's soul lies in the doing. -- William Shakespeare, Troilus and Cressida, Act 1, Scene 2 -- Med venlig hilsen / Kind regards Hack Kampbjørn
unsubscribing from list (WAS: Win ssl bug)
Matt Pease wrote: somebody please get me off this list! emailing [EMAIL PROTECTED] does not work Sorry, that's not how it works. YOU subscribed yourself to the list, you were warned that only YOU could unsubscribe later on, and to save the welcome message in case you might forget which address you subscribed with. It's obviously not [EMAIL PROTECTED] or [EMAIL PROTECTED], but only YOU have a chance of guessing it. Look at the headers of the mail you get from the list maybe it's coming via some mail forwarder service you've forgotten about. Nobody on this list has the powers to unsubscribe other people. Those that have are on [EMAIL PROTECTED] If you're a little more helpful than your latest mail maybe they can help you i.e. you provide them with _all_ the email addresses you used about the time you subscribed to this list. But if you know them all you can just as easily unsubscribe the right address yourself. Thanks - Matt -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: unsubscribing from list (WAS: Win ssl bug)
James C. McMaster (Jim) wrote: What we cannot seem to get through to the thick-headed people is THE AUTOMATED UNSUBSCRIBE PROCEDURE IS BROKEN, AND HAS BEEN FOR A LONG TIME. FOLLOWING THE UNSUBSCRIBE INSTRUCTIONS YOU SO HELPFULLY EXPLAIN IS POINTLESS, BECAUSE THE AUTOMATED UNSUBSCRIBE PROCEDURE IS BROKEN, AND HAS BEEN FOR A LONG TIME. SENDING EMAIL TO [EMAIL PROTECTED] WILL NOT DO THE TRICK BECAUSE THE AUTOMATED UNSUBSCRIBE PROCEDURE IS BROKEN, AND HAS BEEN FOR A LONG TIME. Do you get it now? We are asking for the list admins to fix the list so we can stop bothering people on the list. Do you get it now? You're right I hadn't got it. But if the unsubcribe procedure was broken it's fixed now (btw the list admins can be reached at [EMAIL PROTECTED]). I had no problem unsubscribing myself ([EMAIL PROTECTED]) and subscribing a new mail address ([EMAIL PROTECTED]). From my Welcome msg: Please save this message so that you know the address you are subscribed under, in case you later want to unsubscribe or change your subscription address. [...] You can start a subscription for an alternate address, for example [EMAIL PROTECTED], just add a hyphen and your address (with '=' instead of '@') after the command word: [EMAIL PROTECTED] To stop subscription for this address, mail: [EMAIL PROTECTED] In both cases, I'll send a confirmation message to that address. When you receive it, simply reply to it to complete your subscription. If despite following these instructions, you do not get the desired results, please contact my owner at [EMAIL PROTECTED] Please be patient, my owner is a lot slower than I am ;-) And here is the prove that I successfully unsubscribe my previous mail address: Original Message Subject: GOODBYE from [EMAIL PROTECTED] Date: 4 Dec 2001 17:08:53 - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Hi! This is the ezmlm program. I'm managing the [EMAIL PROTECTED] mailing list. Acknowledgment: I have removed the address [EMAIL PROTECTED] from the wget mailing list. This address is no longer a subscriber. [...] -- Jim McMaster mailto:[EMAIL PROTECTED] -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: --only and --not
Hrvoje Niksic wrote: First, my apologies for the long delay in answering. Welcome back ! For those not on the wget-patches list, check the CVS ChangeLog 8-) http://sunsite.dk/cvsweb/wget/src/ChangeLog The idea behind this patch, and the patch itself, are very interesting. I'll look into it for Wget 1.8 (1.7.1 should be a bugfix-only release.) Several random musings: * It would be nice to have an option to use only one filter, so that people who want speed and/or retain state between filter invocations get get them. * \ is probably not the best choice for the escape character; it's easy to lose it. Maybe %u et al. would be a better choice? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: Compile problem (and possible fix)
Ian Abbott wrote: On 7 Nov 2001, at 23:07, Hack Kampbjørn wrote: Agreed that you don't want to use Apple's precompiler, but I couldn't tell from the links you posted what platform the fix fails to compile on. There was one reference to VC++ 5.0 breaking, but that was for the unfixed version. On Monday, June 25, 2001, at 05:54 PM, Hrvoje Niksic wrote: Perhaps the problem is that the '' constant is within the assert, which might indeed hurt some compilers. In fact, I originally used '\', but a Microsoft compiler couldn't swallow it in the `assert' expression. I read that as '\' (which the original poster proposed) breaks M$ Visual Studio. -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: Incorrect numbers with -c option
Lukasz Bolikowski wrote: Hello! I think the -c option in wget results in misleading output. I have been downloading ftp://ftp.kernel.org/pub/dist/superrescue/v2/superrescue-2.0.0a.iso.gz which is 514596847 bytes long. I aborted the downloading after 246345728 bytes and then run: wget -c --passive-ftp -o super-log ftp://... This is the beginning of the logfile: --17:52:13-- ftp://ftp.kernel.org/pub/dist/superrescue/v2/superrescue-2.0.0a.iso.gz = `superrescue-2.0.0a.iso.gz' Connecting to ftp.kernel.org:21... connected! Logging in as anonymous ... Logged in! == SYST ... done.== PWD ... done. == TYPE I ... done. == CWD /pub/dist/superrescue/v2 ... done. == PASV ... done.== REST 246345728 ... done. == RETR superrescue-2.0.0a.iso.gz ... done. Length: 268,251,119 [21,905,391 to go] (unauthoritative) [ skipping 240550K ] 240550K ,, ,, ,, .. .. 91% @ 34.61 KB/s lots of lines 328050K .. .. .. .. ..125% @ 50.81 KB/s snip! IMHO there is much more to go than 21,905,391 bytes. Besides, the percentages on the right are incorrect. I'm using GNU Wget 1.7 Yes, this has been an ongoing problem in wget (but it should be fixed now at least in CVS if not in 1.7), but I cannot reproduce it with wget 1.7 and the URL provided (so that's likely 1.7). $ wget -d --passive-ftp -c ftp://ftp.kernel.org/pub/dist/superrescue/v2/superrescue-2.0.0a.iso.gz DEBUG output created by Wget 1.7 on cygwin. snip/ 200 PORT command successful. done.== REST 316224 ... -- REST 316224 350 Restarting at 316224. Send STORE or RETRIEVE to initiate transfer. done. == RETR superrescue-2.0.0a.iso.gz ... -- RETR superrescue-2.0.0a.iso.gz 150 Opening BINARY mode data connection for superrescue-2.0.0a.iso.gz (514280623 bytes). done. Lying FTP server found, adjusting. Created socket fd 5. Length: 514,596,847 [514,280,623 to go] [ skipping 300K ] 300K .. .. .. .. .. 0% @ 28.56 KB/s 350K .. .. .. .. .. 0% @ 52.58 KB/s Best regards Lukasz Bolo Bolikowski -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: connect timeout
Nic Ferrier wrote: Sorry if you're already aware of this... I couldn't find the archives of this list at GNU. Maybe you should put a link on the page: http://www.gnu.org/software/wget/ The official web-site is http://wget.sunsite.dk/ Yes, there should be a link from the GNU site. [List] how can it be added ? I've discovered that wget doesn't do connection timeouts. That is if the host it is trying to connect to cannot be reached for some reason then wget simply hangs. I expected wget to return after T seconds after specifying the timeout option on the command line but it didn't. No control of connect timeouts is a serious weakness in a tool designed to be used for batched downloads... I've had to swap wget for curl for the particular task I'm working on (which is a pity because in all other respects I like wget and want to support GNU projects). $ wget --help GNU Wget 1.7, a non-interactive network retriever. Usage: wget [OPTION]... [URL]... [...] Download: [...] -t, --tries=NUMBER set number of retries to NUMBER (0 unlimits). [...] -T, --timeout=SECONDSset the read timeout to SECONDS. -w, --wait=SECONDS wait SECONDS between retrievals. --waitretry=SECONDS wait 1...SECONDS between retries of a retrieval. [...] Which of this were you using ? And please the next time send bugreports including debug output (wget -d ...) Nic Ferrier -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: connect timeout
Nic Ferrier wrote: The official web-site is http://wget.sunsite.dk/ Yes, there should be a link from the GNU site. [List] how can it be added ? sign the project up on savannah (http://savannah.gnu.org). That will provide you with a nice management interface (based on CVS) for changing the wget tree. $ wget --help GNU Wget 1.7, a non-interactive network retriever. Usage: wget [OPTION]... [URL]... Download: -t, --tries=3DNUMBER set number of retries to NUMBER (0 unlimits). -T, --timeout=3DSECONDSset the read timeout to SECONDS. -w, --wait=3DSECONDS wait SECONDS between retrievals. --waitretry=3DSECONDS wait 1...SECONDS between retries of a retrieval. Which of this were you using ? I tried sveral things, including: wget -t 2 -T 10 -w 1 But, be honest, is the -T option actually a *connect* timeout? You're right it's not 8-( $ wget -d -T 5 http://192.168.1.254/ DEBUG output created by Wget 1.7 on cygwin. parseurl (http://192.168.1.254/;) - host 192.168.1.254 - opath - dir - file - ndir newpath: / --22:23:46-- http://192.168.1.254/ = `index.html' Connecting to 192.168.1.254:80... connect: Attempt to connect timed out without establishing a connection Closing fd 3 Retrying. --22:24:08-- http://192.168.1.254/ (try: 2) = `index.html' Connecting to 192.168.1.254:80... [hack@DUR0N2000 webs]$ wget -d -T 5 http://192.168.1.254/ DEBUG output created by Wget 1.7 on cygwin. parseurl (http://192.168.1.254/;) - host 192.168.1.254 - opath - dir - file - ndir newpath: / --22:24:19-- http://192.168.1.254/ = `index.html' Connecting to 192.168.1.254:80... [...] $ wget -d -T 5 http://hostname/ DEBUG output created by Wget 1.7-dev on linux-gnu. parseurl (http://hostname/;) - host hostname - opath - dir - file - ndir newpath: / --22:36:48-- http://hostname/ = `index.html' Connecting to hostname:80... Caching hostname - 192.168.1.254 connect: Connection timed out Closing fd 3 Retrying. --22:39:58-- http://hostname/ (try: 2) = `index.html' Connecting to hostname:80... Found hostname in host_name_address_map: 192.168.1.254 Note: output edited hostname is a host that doesn't answer on port 80. Two different systems on two different networks, that might explain the difference in timeout times. Well, Daniel Stenberg maybe you should try to get your cURL implementation accepted. It's bad when other packages maintainers are more active on the list that Wget's 8-( And please the next time send bugreports including debug output (wget -d =2E..) I don't think it would do you much good in this case... but I can send you one if you want. Likely not, but it would include the Wget version and if that's not 1.7 then the standard recommendation would be to update. You'll be surpised how many bug-reports there is related to older version like 1.5.3 or even a couple to 1.4.5 8-) Nic -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: Compile problem (and possible fix)
Ed Powell wrote: I was compiling wget 1.7 on MacOS X 10.1 (Darwin 1.4). Around line 435 in html-parse.c there's the section: case AC_S_QUOTE1: assert (ch == '\'' || ch == ''); quote_char = ch; /* cheating -- I really don't feel like introducing more different states for different quote characters. */ ch = *p++; state = AC_S_IN_QUOTE; break; I had to change: assert (ch == '\'' || ch == ''); to: assert (ch == '\'' || ch == '\'); Otherwise, it would not compile... it was, I think, interpreting the , rather than using it literally. Escaping it appears to have fixed the problem. Right conclusion, wrong fix. Fix the broken software not the correct one, i.e. your fix breaks wget on another broken platform (read the below links if cannot guess which). You don't want to use Apple's precompiler anyway. http://www.mail-archive.com/cgi-bin/htsearch?method=andformat=shortconfig=wget_sunsite_dkrestrict=exclude=words=darwin http://www.mail-archive.com/wget@sunsite.dk/msg01532.html http://www.mail-archive.com/wget@sunsite.dk/msg01289.html The compiling process was simply doing a 'configure' then 'make'. After making the change described above, I ran 'make' again, and everything was fine. -- Ed Powell - Meus Navis Aerius est Plena Anguillarum http://www.visi.com/~epowell -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: Compilation problems
Andrew Coggins wrote: Hi Wolfgang, Thanks, that cleared the problem. though a search on freshmeat came up empty for texi2pod. texi2pod isn't a separate package, but included in the wget distribution. It should be build by make. But it failed, that's the error message you got. Wget as a GNU software is developed to use GNU make and *BSD system uses (you guessed it) Berkeley (or BSD) make. You should install gmake (it's in ports) and run 'gmake' wherever the documentation says 'make' In general it's a good idea on *BSD systems to built on ports shared knowledge (even when you're no using it's build and installation system). Information like this is already recorded there (and even some patches). On my OpenBSD system: $ cd /usr/ports/net/wget $ make show=USE_GMAKE Yes $ ls patches/ CVSpatch-configure_in patch-doc_wget.texi patch-configurepatch-doc_Makefile.in -Andrew On Sunday 04 November 2001 12:33, Andrew Coggins wrote: erm, well I went to freshmeat, looked up wget, and followed the links. http://www.freshmeat.net So, anyone able to help with my problem? :) Hi, I am also rather asker than answerer on this list. But: Looks like you have to install texi2pod? Second guess: texi2pod tranlates your docs from GNU texinfo to perl pod format (at least I guess so). So I guess for making the program work, you can do without. Which is converted to man format which the pod2man from perl (usually already on the system). make -k should get you farther than just till texi2pod. Cheers, Wolfgang -- Dr. Wolfgang Muuml;ller, assistant == teaching assistant Personal page: http://cui.unige.ch/~vision/members/WolfgangMueller.html Maintainer, GNU Image Finding Tool (http://www.gnu.org/software/gift) -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: wget 1.5.3 suggestions
Hermann Rugen wrote: Hallo folks, nice to use this software. I was lokking for more, than I got. Maybe, I made a mistake. Downloading or trying to mirror a Site does run, but without style sheets. For eaxample:take my homepage. I did 'wget' it for testing and got not the complete one. Steyle sheets where missing. Version 1.5.3 is several years old now. Try using the latest version of wget it should know about Style Sheets. You can download the source from the web-site (http://wget.sunsite.dk/). What else, I don't know now. I would be happy, to get a wget.conf-sample fo downloding a Site fo mirroring. Can you help me? wget is running on SUSE7.1 with linux2.2.18 kind regards Hermann Rugen eMail: [EMAIL PROTECTED] Internet: www.rugen-consultng.com -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: problem with permanent connection.... (maybe a bug)
Marcelo Taube wrote: wget was working OK with my ppp connection, now I installed a new connection throw a NE2000 ETHERNET card, most of the programms work Ok and faster but wget (and other programms that depend on it) don't work anymore this is what happens when I try to download a file (It happens with other files from other ftp servers as well)... ** [root@localhost /root]# wget ftp://ftp.cs.tu-berlin.de/pub/X/XFree86/4.1.0/binaries/Linux-ix86-glibc22/Xf100.tgz Next time please send the debug output (wget -d ...) --14:28:48-- ftp://ftp.cs.tu-berlin.de/pub/X/XFree86/4.1.0/binaries/Linux-ix86-glibc22/Xf100.tgz = `Xf100.tgz' Connecting to ftp.cs.tu-berlin.de:21... connected! Logging in as anonymous ... Logged in! == SYST ... done.== PWD ... done. == TYPE I ... done. == CWD /pub/X/XFree86/4.1.0/binaries/Linux-ix86-glibc22 ... done. == PORT ... done.== RETR Xf100.tgz ... done. ** After the last done it freezes and doesn't download a single byte. I have already updated to the last version of wget (cvs version) but the problem was not fixed. Just a guess, but are your new network connection using NAT. And if so does the router (or firewall) has a FTP proxy to allow active FTP connections through ? If not, try using passive mode. $ wget --help GNU Wget 1.7, a non-interactive network retriever. Usage: wget [OPTION]... [URL]... [...] FTP options: -nr, --dont-remove-listing don't remove `.listing' files. -g, --glob=on/off turn file name globbing on or off. --passive-ftp use the passive transfer mode. --retr-symlinks when recursing, get linked-to files (not dirs). ** [root@localhost /root]# wget --version GNU Wget 1.7.1-pre1 Copyright (C) 1995, 1996, 1997, 1998, 2000, 2001 Free Software Foundation, Inc. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. Originally written by Hrvoje Niksic [EMAIL PROTECTED]. ** I have no idea of what's causing this... Thank you in advance... -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: convert-links doesn't work on directories
Dan Christensen wrote: By the way, I searched but couldn't find the cvs repository for wget. I found the sunsite repository, but it seems like it hasn't been updated since June, so I'm guessing it moved somewhere else? (Or maybe I'm just not getting the right branch?) Some of the web pages for wget that google turns up seem out of date. Take a look at the developement page for instructions to access to the CVS sources: http://wget.sunsite.dk/wgetdev.html I already did that and found a version which hasn't been updated since June. That is the latest version not much get commited to CVS with the commiters MIA (or busy with real work). And AFAIK there's no archive for the wget-patches where activity has continued 8-( Also, the cvsweb link on that page gets redirected to http://sunsite.dk/sunsite.css which says The requested URL /sunsite.css was not found on this server. Then you're using Netscape Navigator with JavaScript and StyleSheets activated and there's a broken link in the page (well the style sheet link). Yes, it's annoying how Netscape won't show such a page, such disable JavaScript and you'll get it without CSS. I'll report this to [EMAIL PROTECTED] Thanks for the help. You'll welcome. Dan -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: ampersand troubles
[EMAIL PROTECTED] wrote: When I am using wget for CGI-scripts with argumets, I need use (ampersand) between argumets; but wget change to %26 via quoting. Ho cat I get http://find.infoart.ru/cgi-bin/yhs.pl?hidden=http%3A%2F%2Ffind.infoart.ruword=wget ? Which version of wget are you using? I have no problem getting this page with wget 1.7. Note that I use -O as the wget proposed filename would be illegal on windows (contains '?:/'): $ wget -O testing -S 'http://find.infoart.ru/cgi-bin/yhs.pl?hidden=http%3A%2F%2Ffind.infoart.ruword=wget' --12:06:44-- http://find.infoart.ru/cgi-bin/yhs.pl?hidden=http%3A//find.infoart.ruword=wget = `testing' Connecting to find.infoart.ru:80... connected! HTTP request sent, awaiting response... 200 OK 2 Date: Sun, 21 Oct 2001 10:09:23 GMT 3 Server: Apache/1.3.20 (Unix) mod_fastcgi/2.2.8 rus/PL30.5 4 Connection: close 5 Content-Type: text/html; charset=windows-1251 6 0K ...@ 10.82 KB/s Last-modified header missing -- time-stamps turned off. 12:06:44 (10.82 KB/s) - `testing' saved [7327] PS: Please, answer directly to [EMAIL PROTECTED]. -- Med venlig hilsen / Kind regards Hack Kampbjørn
[Cygwin: Updated: wget-1.7-1]
Since some on this list has problem compiling wget on Cygwin (those that don't know what cygwin is just skip this message or read about: http://www.cygwin.com/). I'm forwarding this announcement to let them know that they can just use the cygwin package. Hack 8-) Original Message Subject: Updated: wget-1.7-1 Date: Thu, 18 Oct 2001 19:01:12 +0200 From: Hack Kampbjørn [EMAIL PROTECTED] Reply-To: [EMAIL PROTECTED] To: [EMAIL PROTECTED] I've update wget in cygwin to version 1.7-1 DESCRIPTION: GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP, the two most widely-used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without Xsupport, etc. CHANGES: - SSL (or https) support - Cookies - HTTP/1.1 KeepAlive (persistent) connections - Many FTP improvements including support of NT and VMS servers - Internal structure changes resulting in big speedups when downloading big sites (thousands of documents) For more changes see the NEWS file (/usr/doc/wget-1.7/NEWS) WARNING: wget-1.7-1 depends on a installed openssl package! INSTALLATION: To update your installation, click on the Install Cygwin now link on the http://cygwin.com/ web page. This downloads setup.exe to your system. Then, run setup and answer all of the questions. Note that we do not allow downloads from sources.redhat.com (aka cygwin.com) due to bandwidth limitations. This means that you will need to find a mirror which has this update. In the US, ftp://mirrors.rcn.net/mirrors/sources.redhat.com/cygwin/ is a reliable high bandwidth connection. In Germany, ftp://ftp.uni-erlangen.de/pub/pc/gnuwin32/cygwin/mirrors/cygnus/ is usually pretty good. In the UK, http://programming.ccp14.ac.uk/ftp-mirror/programming/cygwin/pub/cygwin/ is usually up-to-date within 48 hours. If one of the above doesn't have the latest version of this package then you can either wait for the site to be updated or find another mirror. The setup.exe program will figure out what needs to be updated on your system and will install newer packages automatically. If you have questions or comments, please send them to the Cygwin mailing list at: [EMAIL PROTECTED] . I would appreciate if you would use this mailing list rather than mailing me directly. This includes ideas and comments about the setup utility or Cygwin in general. If you want to make a point or ask a question, the Cygwin mailing list is the appropriate place. *** CYGWIN-ANNOUNCE UNSUBSCRIBE INFO *** If you want to unsubscribe from the cygwin-announce mailing list, look at the List-Unsubscribe: tag in the email header of this message. Send email to the address specified there. It will be in the format: [EMAIL PROTECTED] NOTES: Yes, we have a new maintainer 8-) -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: convert-links doesn't work on directories
Dan Christensen wrote: Dear wget maintainers, I hope the wget maintainers (Hrvoje Niksic and Dan Harkless) are reading this, but they've been MIA for some months now. I noticed that the Debian wget maintainer isn't forwarding many bugs upstream to you. If you are curious, you can find the list of bugs at http://bugs.debian.org/wget I wasn't aware of this site. A quick scan of it shows that many of the bugs are related to wget version 1.5.3 (which is the debian stable package), note that it is quite old and that many small quirks have been fixed since then. I'll try to find some time to go through it, The bug that bit me today is http://bugs.debian.org/62425 which has been open for a year and a half. In short, convert-links doesn't handle URL's of the form .../directory or .../directory/. If they are replaced with .../directory/index.html then it works, but otherwise it thinks it hasn't downloaded the URL's. IIRC this is a known bug, but nobody has been annoyed enough to provide a fix. Try searching the wget archives for a better answer (we really need an archive for the wget-patches list) By the way, I searched but couldn't find the cvs repository for wget. I found the sunsite repository, but it seems like it hasn't been updated since June, so I'm guessing it moved somewhere else? (Or maybe I'm just not getting the right branch?) Some of the web pages for wget that google turns up seem out of date. That is the official site (and has always been since wget got a web-site): http://wget.sunsite.dk/ or http://sunsite.dk/wget (previously known as http://sunsite.auc.dk/wget but AUC (Aalborg Universitets Center) has changed name (they didn't want to be just a center but a real university) and dropped the auc.dk domain Take a look at the developement page for instructions to access to the CVS sources: http://wget.sunsite.dk/wgetdev.html Thanks for a great program. If you know any easy work-arounds for the above bug, I'd love to hear them. Or if I can get access to a version with it fixed, that'd be great. We have a website we're trying to put onto a cd, and this is the only thing in our way. Dan -- Dan Christensen [EMAIL PROTECTED] -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: WGET multiple files?
Ifj. Pentek Imre wrote: Dear Sir, I'm writing to you because I want to know if WGET can be used to download multiple files. So if I want to download files in the same dir? What to do in this case? Can your program handle wildcards (like *?)? This is best answer by reading the documentation on the web-site (http://sunsite.dk/wget/). I think all your questions are answered in first section. You are os course welcome to suggest improvements to our documentation 8-) Introduction to GNU wget GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP, the two most widely-used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without Xsupport, etc. Wget has many features to make retrieving large files or mirroring entire web or FTP sites easy, including: Can resume aborted downloads, using REST and RANGE Can use filename wild cards and recursively mirror directories NLS-based message files for many different languages Optionally converts absolute links in downloaded documents to relative, so that downloaded documents may link to each other locally Runs on most UNIX-like operating systems as well as Microsoft Windows Supports HTTP and SOCKS proxies Supports HTTP cookies Supports persistent HTTP connections Unattended / background operation Uses local file timestamps to determine whether documents need to be re-downloaded when mirroring GNU wget is distributed under the GNU General Public License. Thank you for your answer for my letter. Yours sincerely:Imre Pentek E-mail:[EMAIL PROTECTED] -- Med venlig hilsen / Kind regards Hack Kampbjørn [EMAIL PROTECTED] HackLine +45 2031 7799
Re: cgi scripts and wget
Samer Nassar wrote: Hello, I am an undergrad student in University of Alberta, and downloaded wget recently to mirror a site for research purposes. However, wget seems to be having trouble pulling pages whose urls are cgi. I went through wget manual and didn't see anything about this. Any hints? Please include the debug output of running wget with the -d option. There's a couple of problems you can have with CGI scripts and the like: - On windows systems: '?' is an illegal character in filenames - The CGI is filtering on UserAgent value - Use of cookies - Use of POST instead of GET ... Thanks for your help. Samer -- Med venlig hilsen / Kind regards Hack Kampbjørn [EMAIL PROTECTED] HackLine +45 2031 7799
Problem with https connection (sslv23 - ok with sslv3)
Some time ago I came across this web-site with HTTPS connection problems: $ wget -S https://www.ihi.dk/ --23:34:50-- https://www.ihi.dk/ = `index.html' Connecting to www.ihi.dk:443... connected! Unable to establish SSL connection. Unable to establish SSL connection. $ But it works in my browser. So I try with the openssl client: $ openssl s_client -connect www.ihi.dk:443 CONNECTED(0003) 592:error:140790E5:SSL routines:SSL23_WRITE:ssl handshake failure:s23_lib.c:216: $ Same problem 8-( Now it's time to force the SSL protocols manually: $ openssl s_client -ssl3 -connect www.ihi.dk:443 CONNECTED(0003) depth=0 /C=DK/ST=Copenhagen/L=Copenhagen/O=International Health Insurance/CN=www.ihi.dk verify error:num=20:unable to get local issuer certificate verify return:1 depth=0 /C=DK/ST=Copenhagen/L=Copenhagen/O=International Health Insurance/CN=www.ihi.dk verify error:num=27:certificate not trusted verify return:1 depth=0 /C=DK/ST=Copenhagen/L=Copenhagen/O=International Health Insurance/CN=www.ihi.dk verify error:num=21:unable to verify the first certificate verify return:1 --- Certificate chain 0 s:/C=DK/ST=Copenhagen/L=Copenhagen/O=International Health Insurance/CN=www.ihi.dk i:/C=US/O=RSA Data Security, Inc./OU=Secure Server Certification Authority --- Server certificate -BEGIN CERTIFICATE- MIICCjCCAXcCEAUS4W9dOIDJk7K/MmOykJUwDQYJKoZIhvcNAQEEBQAwXzELMAkG A1UEBhMCVVMxIDAeBgNVBAoTF1JTQSBEYXRhIFNlY3VyaXR5LCBJbmMuMS4wLAYD VQQLEyVTZWN1cmUgU2VydmVyIENlcnRpZmljYXRpb24gQXV0aG9yaXR5MB4XDTAw MDYxMzAwMDAwMFoXDTAxMDYyNjIzNTk1OVowdTELMAkGA1UEBhMCREsxEzARBgNV BAgTCkNvcGVuaGFnZW4xEzARBgNVBAcUCkNvcGVuaGFnZW4xJzAlBgNVBAoUHklu dGVybmF0aW9uYWwgSGVhbHRoIEluc3VyYW5jZTETMBEGA1UEAxQKd3d3LmloaS5k azBcMA0GCSqGSIb3DQEBAQUAA0sAMEgCQQC8OGOR/9UZ6EFk8oGLVB5C3VbXG5T4 V5zZJyPRFh7KTBtSnWQvGSxMBwES/n8kIowsX1cRZw2ot1aaU3X8k3KvAgMBAAEw DQYJKoZIhvcNAQEEBQADfgAM3sAMXClUWsrMM7Ztx/+HuqEi5rHs4MouKPmj93e0 U8eV2QqsuwDKIkUxqyLFdiWKCmGbMasAOAOyS1wL7CIu2QCsNFINNBQX4LD19WYg +Vh3QHGB4EewkidIZ0Q9AD+DKMqAC45cB6JmbJ512gA3u9z1vpmiL8ZimmXPAg== -END CERTIFICATE- subject=/C=DK/ST=Copenhagen/L=Copenhagen/O=International Health Insurance/CN=www.ihi.dk issuer=/C=US/O=RSA Data Security, Inc./OU=Secure Server Certification Authority --- No client certificate CA names sent --- SSL handshake has read 694 bytes and written 238 bytes --- New, TLSv1/SSLv3, Cipher is DES-CBC3-SHA Server public key is 512 bit SSL-Session: Protocol : SSLv3 Cipher: DES-CBC3-SHA Session-ID: 114EFD511DE3F7FBDE1A8C917F7E4DC9CA7F66BA5D478FC82778ED923CBE43CA Session-ID-ctx: Master-Key: 509D485AC95363FA0F8C2786DFE1E90D78564CAF45F78082EFF81A8FED0E87C1D46B29 824AE396EB953907BA0D07EB73 Key-Arg : None Start Time: 991604431 Timeout : 7200 (sec) Verify return code: 0 (ok) --- HEAD / HTTP/1.0 HTTP/1.1 302 Found Server: Lotus-Domino/5.0.6 Date: Sun, 03 Jun 2001 22:30:51 GMT Location: ihihome.nsf/all/e_main Connection: close Content-Type: text/html Content-Length: 310 read:errno=0 $ BINGO ! Now I change line 54 in src/gen_sslfunc.c /* meth = SSLv23_client_method (); */ meth = SSLv3_client_method (); $ wget -S https://www.ihi.dk/ --23:35:36-- https://www.ihi.dk/ = `index.html' Connecting to www.ihi.dk:443... connected! HTTP request sent, awaiting response... 302 Found 2 Server: Lotus-Domino/5.0.6 3 Date: Sun, 03 Jun 2001 22:31:12 GMT 4 Location: ihihome.nsf/all/e_main 5 Connection: close 6 Content-Type: text/html 7 Content-Length: 310 8 Location: ihihome.nsf/all/e_main [following] --23:35:37-- https://www.ihi.dk/ihihome.nsf/all/e_main = `e_main' Connecting to www.ihi.dk:443... connected! HTTP request sent, awaiting response... 200 OK 2 Server: Lotus-Domino/5.0.6 3 Date: Sun, 03 Jun 2001 22:36:54 GMT 4 Connection: close 5 Content-Type: text/html; charset=US-ASCII 6 Content-Length: 1404 7 Last-Modified: Wed, 23 May 2001 14:23:36 GMT 8 0K . 100% @ 1.34 MB/s 23:35:37 (1.34 MB/s) - `e_main' saved [1404/1404] $ wget -S https://www.ihi.dk/ Now that was a really crude solution. I'm not so familiar with openssl but isn't it supposed to just use the right SSL protocol. If this is the expected behavior and not a bug in openssl then we should allow the user to override the SSL protocol used (maybe a --ssl-version=ssl3 or something). Or even better circle throught them all till it clicks (if that's possible). $ openssl version OpenSSL 0.9.6 24 Sep 2000 -- Med venlig hilsen / Kind regards Hack Kampbjørn [EMAIL PROTECTED] HackLine +45 2031 7799
Re: Is there a way to override wgetrc options on command line?
Humes, David G. wrote: Hello, I have several cronjobs using wget and the wgetrc file turns on passive-ftp by default. I have one site where strangely enough passive ftp does not work but active does work. I'd rather leave the passive ftp default set and just change the one cronjob that requires active ftp. Is there any way to tell wget to either disregard the wgetrc file or to override one or more of its options? Thanks. What about --execute=COMMAND ? $ wget --help GNU Wget 1.7-pre1, a non-interactive network retriever. Usage: wget [OPTION]... [URL]... Mandatory arguments to long options are mandatory for short options too. Startup: -V, --version display the version of Wget and exit. -h, --help print this help. -b, --backgroundgo to background after startup. -e, --execute=COMMAND execute a `.wgetrc'-style command. [...] -- Med venlig hilsen / Kind regards Hack Kampbjørn [EMAIL PROTECTED] HackLine +45 2031 7799
Re: What do you think my chances are of getting wget to work on HP-UXare ?
Alan Barrow wrote: This message contains information which may be confidential and privileged. Unless you are the addressee (or authorised to receive for the addressee), you should not use, copy or disclose to anyone the details or information contained in this message. The content of the message and or attachments may not reflect the view and opinions of the originating company. If you have received this message in error, you should reply to the sender and copy [EMAIL PROTECTED] and delete the message from your system. Thank you for your co-operation. I think that your changes are pretty good (if not 100%) to get wget working if you set enough effort into it. sarcasm I will even consider (without knowing HP-UX) you have 50% chances to succeeding with less effort than taken in sending this mail: ./configure make make install /sarcasm PS: Watch your lines length, max. 72 chars please. -- Med venlig hilsen / Kind regards Hack Kampbjørn [EMAIL PROTECTED] HackLine +45 2031 7799
Re: Change Request: Cookies 4 WGet
Michael Klaus wrote: Dear WGet team, first of all, i want to say that WGet really is a _great_ program. My company is mostly using it for regression tests for different web servers and servlet engines. And there's the problem. Servlet engines meintain their sessions - which are critical for regression tests - via cookies. A functionality to hold cookies (one cookie would be sufficient for this task) and send them back with each request would really be helpful. Would it be able for someone of your team to support us getting this to work? We have a bit of c knowledge here and perhaps would even be able to write it ourselves...if we only had a clue where to change what :-/ Cookie support has been added in the current developement code. You can get it from CVS see the Developement section on the web-site (http://sunsite.dk/wget/). Of course with all the usual warnings about using developement code. Many thanks in advance, Michael Klaus -- Michael Klaus Entwickler / IT-Consultant orgafactory gmbh Hügelstraße 8 60435 Frankfurt am Main Telefon (0 69) 90 54 66 35 Telefax (0 69) 90 54 66 13 mailto:[EMAIL PROTECTED] -- Med venlig hilsen / Kind regards Hack Kampbjørn [EMAIL PROTECTED] HackLine +45 2031 7799
Re: How to put user name and password using wget?
N SHU wrote: One(version1.6) is OK in Unix, another(version1.5) in Dos(windows) has problem, Well then this bug has been fixed (at least) in version 1.6. Is there any reason you cannot update your wget to that version. There are links to binaries from the web-site (http://sunsite.dk/wget/). The wget debug out are like below: in Windows has problem=== DEBUG output created by Wget 1.5.0 on Windows. This is the second bug-report (in this eastern week-end) from "Wget 1.5.0 on Windows". Does anyone know where this file comes from? Is there any chanced that we can encoraged them to update to the current version of Wget (version 1.6). parseurl ("ftp://shu:[EMAIL PROTECTED]/1.html") - host astro12.phy. ornl.gov - opath 1.html - dir - file 1.html - ndir --10:18:58-- ftp://shu:[EMAIL PROTECTED]:21/1.html = `1.html' wget: Cannot determine user-id. While in Unix , it is OK== DEBUG output created by Wget 1.6 on osf4.0. parseurl ("ftp://shu:[EMAIL PROTECTED]/1.html") - host astro12.phy. ornl.gov - opath 1.html - dir - file 1.html - ndir newpath: /1.html --10:21:21-- ftp://shu:[EMAIL PROTECTED]/1.html = `1.html.4' Connecting to astro12.phy.ornl.gov:21... Created fd 3. connected! Logging in as shu ... 220 astro12.phy.ornl.gov FTP server (Digital UNIX Version 5.60) ready. -- USER shu 331 Password required for shu. -- PASS mypasswd 230 User shu logged in. Logged in! == TYPE I ... -- TYPE I 200 Type set to I. done. == CWD not needed. == PORT ... Master socket fd 5 bound. -- PORT 134,167,21,90,9,115 200 PORT command successful. done.== RETR 1.html ... -- RETR 1.html 150 Opening BINARY mode data connection for 1.html (134.167.21.90,2419) (2220 by tes). done. Created socket fd 6. Length: 2,220 (unauthoritative) 0K - .. [100%] Closing fd 6 Closing fd 5 226 Transfer complete. Closing fd 3 10:21:21 (541.99 KB/s) - `1.html.4' saved [2220] =END== Thanks. From: Hack Kampbjrn [EMAIL PROTECTED] Reply-To: Wget List [EMAIL PROTECTED] To: N SHU [EMAIL PROTECTED] CC: [EMAIL PROTECTED] Subject: Re: How to put user name and password using wget? Date: Mon, 16 Apr 2001 22:19:23 +0200 N SHU wrote: Dear sir, I don't know how to put username and passwd using wget. When I used: wget ftp://username:[EMAIL PROTECTED]/file, it said: Can't dermine user-id. This sounds like a bug. Please send the output of running Wget in debug mode: wget -d ftp://username:password@ftp Thanks. N.Shu. _ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com. -- Med venlig hilsen / Kind regards Hack Kampbjrn [EMAIL PROTECTED] HackLine +45 2031 7799 _ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com. -- Med venlig hilsen / Kind regards Hack Kampbjrn [EMAIL PROTECTED] HackLine +45 2031 7799
Re: How to put user name and password using wget?
N SHU wrote: Dear sir, I don't know how to put username and passwd using wget. When I used: wget ftp://username:[EMAIL PROTECTED]/file, it said: Can't dermine user-id. This sounds like a bug. Please send the output of running Wget in debug mode: wget -d ftp://username:password@ftp Thanks. N.Shu. _ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com. -- Med venlig hilsen / Kind regards Hack Kampbjrn [EMAIL PROTECTED] HackLine +45 2031 7799
Re: unsubsribe -- fishlet@hotmail.com
Take a look at the web-site (http://sunsite.dk/wget/) there's directions on how to unsubscribe from the mailing-lists Wei Xiong wrote: _ Get your FREE download of MSN Explorer at http://explorer.msn.com -- Med venlig hilsen / Kind regards Hack Kampbjrn [EMAIL PROTECTED] HackLine +45 2031 7799
Re: Changing links to absolute.
Rakhesh Sasidharan wrote: Hi, I don't know if wget is the tool for this, but I still ask. I need to mirror some sites for offline viewing. I usally use "wget" to recursively suck parts of the web, but in some cases it does not work. For example, say I want to suck the whole site (www.imap.org) to /mirrors/www.imap.org. This means I want all links in the file of the form /pics/something.jpg should be modified to point to /mirrors/www.imap.org/pics/something.jpg -- is there some way to do this automatically, or do I have to visit each file and do it manually ? Or maybe I'm using the wrong program for mirroring ... could somebody help ? --convert-links should do this but it had some problems with these hostless absolute links. I'm not sure if this has been fixed for all cases but you can track the discussion in the archives or, even better try the current CVS source (wget-1.7-dev) and report any problems you find. Instruction on how to get and compile the CVS source is on the web-site (http://sunsite.dk/wget) under developement. Thanks, You're welcome __ Rakhesh Sasidharan rakhesh at cse.iitd.ac.in -- Med venlig hilsen / Kind regards Hack Kampbjrn [EMAIL PROTECTED] HackLine +45 2031 7799
Re: -r without effect
Micha Meier wrote: On Mon, Apr 09, 2001 at 07:03:49PM +0200, Hack Kampbjrn wrote: What is reclevel set to (if any) in /etc/wgetrc? Try setting -l (or --level) see if it helps. reclevel is not set in /etc/wgetrc, setting -l makes no difference. Strangely, for some URL's -r works and for others not. It could be that if wget thinks the URL is not an HTML document, it won't do the recursive lookup, but this would be a bug, IMHO, especially when -F works only with the -i option, not with an URL. But still, all this has worked before...?! I dunno ... maybe you should send the output of running wget in debug mode: wget -d -r http:// Cheers, --Micha -- Med venlig hilsen / Kind regards Hack Kampbjrn [EMAIL PROTECTED] HackLine +45 2031 7799
Re: -r without effect
Micha Meier wrote: I was using wget with SuSE 6.2. After upgrading to 7.1, wget refuses to search recursively, even in the same script that used to work before. It is the same wget version 1.5.3 as before, /etc/wgetrc is the same and I'm not using .wgetrc. Can someone tell me what else could have changed? I've also tried to compile 1.6 sources, but the result is the same: I say wget -r -nc http://... and wget says the file is already there and does not look at the recursive links. What is reclevel set to (if any) in /etc/wgetrc? Try setting -l (or --level) see if it helps. --Micha -- Med venlig hilsen / Kind regards Hack Kampbjrn [EMAIL PROTECTED] HackLine +45 2031 7799
Re: Mirrorinf a web site with FTP
John Vorstermans wrote: Hi. I have a problem which I cannot work out and wonder if anyone can point our what I am doing wrong. I wish to mirror a WEB site between to machines. To do this I am useing: wget -m -l50 -L ftp://www-data:[EMAIL PROTECTED] It logs in just fine but will only collect files in the / directory of the server. I wish it to copy files from subsdirectories (as in a recursive copy which the -m should do). Here is an example of the sort of error I am getting: = also.--14:10:25-- ftp://www-data:[EMAIL PROTECTED]:21/%2Fcommunications / = `xena.website.org/_stats/index.html' Connecting to xena.website.org:21... connected! Logging in as www-data ... Logged in! == TYPE I ... done. == CWD /communications ... No such directory `/communications'. Hmmm, this could be that you're not placed in the root directory. Older wget expected to be placed in / and couldn't handle it when placed somewhere else. This should be fixed in the current 1.7-dev version. There has been some discussion here in the list about what is the right thing to do and what wget should do with URL like: ftp://ftp.somehost.com/path ftp://ftp.somehost.com//path ftp://ftp.somehost.com/%2Fpath Try using wget version 1.6 or even better the current development version (1.7-dev). Look at the web-site (http://sunsite.dk/wget/) for instructions. There's also links to mail-archives of the list. --14:10:25-- ftp://www-data:[EMAIL PROTECTED]:21/%2Fcouncil/ = `xena.website.org/council/.listing' Connecting to xena.website.org.:21... connected! Logging in as www-data ... Logged in! == TYPE I ... done. == CWD /council ... No such directory `/council'. === The directories are present on the server I am copying from and even of I create them on the destination server I get the same errors. I am using FTP because the files on the server contain "Sever Side Includes" which get translated when using HTTP which is not what we want. Any advice would be most welcome. Thanks John -- John Vorstermans|| [EMAIL PROTECTED] Serion E-Commerce Solutions || Ph (021) 432-987 New Zealand -- Med venlig hilsen / Kind regards Hack Kampbjrn [EMAIL PROTECTED] HackLine +45 2031 7799
Re: WGet v1.7 - Problems with Serv-U Win32 FTP Server?
GoTo wrote: Hi, perhaps anyone of you can explain me the following: I tried to WGet a file from my local FTP-Server - don't ask me why :-) WGet v1.7-dev 2001/02/16 (with ssl support) for Win32 Serv-U v2.5e for Win32 -- WGet Output Start - D:\wget -d http://guest:[EMAIL PROTECTED]:2100/ Have you tried to use FTP. It usually helps when talking to a FTP Server 8-) DEBUG output created by Wget 1.7-dev on Windows. parseurl ("http://guest:[EMAIL PROTECTED]:2100/") - host 127.0.0.1 - port 2100 - opath - dir - file - ndir newpath: / --18:23:35-- http://guest:password@127.0.0.1:2100/ = `index.html.3' Connecting to 127.0.0.1:2100... Created fd 36. connected! ---request begin--- GET / HTTP/1.0 User-Agent: Wget/1.7-dev Host: 127.0.0.1:2100 Accept: */* Connection: Keep-Alive Authorization: Basic Z3Vlc3Q6Z3Vlc3Q= ---request end--- HTTP request sent, awaiting response... 220-Serv-U FTP-Server v2.5e for WinSock ready... Closing fd 36 18:23:35 ERROR -1: Malformed status line. -- WGet Output End - So I tried the usual console FTP: - Console FTP Output Start --- D:\ftp Ftp open localhost 2100 Verbindung mit arcticblue. 220-Serv-U FTP-Server v2.5e for WinSock ready... 220-Welcome to ArcticBlue. 220- some more text here 220- 220-Have Fun! 220- 220 Benutzer (arcticblue:(none)): guest 331 User name okay, need password. Kennwort: 230 User logged in, proceed. Ftp dir 200 PORT Command successful. 150 Opening ASCII mode data connection for /bin/ls. -rwxrwxrwx 1 user group2048 Mar 5 14:59 1.r00 226 Transfer complete. Ftp: 62 Bytes empfangen in 0.00Sekunden 62000.00KB/Sek. Ftp - Console FTP Output End --- As you can see this worked perfectly. My first thought was the startup message, but then I tried another FTP-Server (ftp.leo.org) which also has a startup-msg and that worked fine with WGet. Any hints? bye GoTo -- Med venlig hilsen / Kind regards Hack Kampbjrn [EMAIL PROTECTED] HackLine +45 2031 7799
Problem accessing CVS server ...
[wget-website]$ cvs update -d cvs [update aborted]: recv() from server sunsite.dk: EOF [wget-website]$ -- Med venlig hilsen / Kind regards Hack Kampbjrn [EMAIL PROTECTED] HackLine +45 2031 7799
Re: FTP retrieval not functioning
Chunks wrote: I am attempting to retrieve all subdirecties on a specific host. It appears that the host is not using a standard FTP service as using a standard (win32) FTP client (CuteFTP) I will not get a visible directory listing. However, when using FTP straight from the command line (Win2k machine), it works without any problems (ls, dir, get, all work fine.) I enabled debug, logged it, and it seems the wget is successfully viewing the directory tree, yet for some reason (permissions?) it is not recursively entering these directories. The error looks as follows: 0Feb-21-2001 14:37:20 xxx.yyy.zzz DIR UNKOWN; perms 0; Skipping. and this is what appears for all directories in the tree. The user I am logging in with definitely has permissions (I am able to down load files manually or by telling wget to retrieve a specific file by name, just not all files.) The tail end of the log, if this helps, is as follows (IP's and file names changed =) ): 0Feb-21-2001 14:37:26 xx.yyy.zzz DIR UNKOWN; perms 0; Skipping. --15:20:10-- ftp://blah:[EMAIL PROTECTED]:21/blah/ = `10.10.10.10/blah/index.html' == CWD not required. == PASV ... -- PASV 227 Entering Passive Mode 10,10,10,10,4,1 Will try connecting to 10.10.10.10:1025. Created fd 412. done.== RETR ... -- RETR 501 "" is a directory, not a file No such file `'. Closing fd 412 Closing fd 384 FINISHED --15:20:10-- Downloaded: 0 bytes in 0 files I did RTFM, and the links to any mailing list archives I could find were broken. Please accept my apologies in advance if this is something covered elsewhere. Perhaps ignoring permissions will take care of it? How come then that you didn't find this message: http://www.mail-archive.com/wget@sunsite.dk/msg00326.html At least the last remark is revelant in your case: "And please, in the future include debug output when reporting problems !!!". If you had followed it I could tell if you're dealing with MS's so called "FTP Server" or not. I am running GNU Wget 1.5.3.1, win32 compilation and have also tried wget 1.5.3 linux compilation with identical results. BTW the latest release version is 1.6. And the web-site is at http://sunsite.dk/wget I appreciate any and all help, Kit -- Med venlig hilsen / Kind regards Hack Kampbjrn [EMAIL PROTECTED] HackLine +45 2031 7799
Re: wget feature request: mail when complete
"Mordechai T. Abzug" wrote: Sometimes, I run wget in background to download a file that will take hours or days to complete. It would be handy to have an option for wget to send me mail when it's done, so I can fire and forget. Thanks! - Morty wget comes from the *nix world where utilities tries to be good at one or two things and relay on other utilities to good at other things so that they don't bloat their code. E.g. wget is good at downloading files from the internet: using http and ftp adding other protocol might be a natural thing for wget to do. But for sending mail there're already a lot of other utilities that's good at that. And there are also a bunch of utilities that are good at making other utilities cooperate and intercomunicate: those are the shells. I use the bash shell so if I wanted this feature I'll type something like: $ (wget -r -l 0 http://www.vigilante.com/ | mail -s "wget run completed" `id -un`) Arghhh, wget sends the output to STDERR. Well then sm like: $ (wget -r -l 0 http://www.vigilante.com/ 21 | mail -s "wget run completed" `id -un`) Or if I used it a lot make a litlle script for it: $ cat~/bin/bwget #!/bin/bash # Background wget: runs wget and sends a mail when finished (wget $* 21 | mail -s "wget run completed" `id -un`) ^D $ chmod 0700 ~/bin/bwget Look at the documentation for the shell you use. -- Med venlig hilsen / Kind regards Hack Kampbjrn [EMAIL PROTECTED] HackLine +45 2031 7799
Re: can wget do POSTing?
Cyrus Adkisson wrote: I'm trying to retrieve information from a website, but to get to the page I want, there is a form submission using the POST method. I've tried everything I know to do, including using a --header="POST / HTTP/1.0" parameter, but with all the errors I'm getting, I'm starting to come to the conclusing that wget is only capable of GET http requests. That would explain why it's called wget and not wpost, right? Am I correct in this assumption? As you already found out, wget can only do GET. If so, does anyone have any ideas how I might retrieve the webpage beyond the POST form? I'd really appreciate any help you might have for me. You can try and use the GET method anyway. Many web-scripts don't really care with method you use (or even cookies). But I suppose you already have tried that 8-( Next you can use your browser and make the POST request there. Save the resulting page. And the use the `--force-html' and `--input-file' options to retrieve all the resting pages. If those pages also requires POST to access then you could consider adding support for this method in wget 8-) Cyrus -- Med venlig hilsen / Kind regards Hack Kampbjrn [EMAIL PROTECTED] HackLine +45 2031 7799
compiling --with-ssl under cygwin
I tried to run compile wget with the ssl support under cygwin. $ Makefile realclean $ Makefile -f Makefile.cvs $ ./configure --with-ssl $ make But I got a bunch of undefined references like: /usr/lib/libssl.a(ssl_lib.o)(.text+0x585):ssl_lib.c: undefined reference to `BIO_s_socket' If I changed the Makefile to link with crypto first LIBS = -lintl -l crytpo -lssl them it compile fine. Now since Makefile is an autogenerated one, I looked where to fix this. After trying a couple of things I ended changing the order of the AC_CHECK_LIB for crypto and ssl in configure.in: AC_CHECK_LIB(crypto,main,,ssl_lose=yes) AC_CHECK_LIB(ssl,SSL_new,,ssl_lose=yes,-lcrypto) I'm not sure that this is the right solution (FAIK it might break things on other platforms) or that is fixed the right place. So if someone on the list with more knowledge about crypto and ssl can help here I will try it out. -- Med venlig hilsen / Kind regards Hack Kampbjrn [EMAIL PROTECTED] HackLine +45 2031 7799
Re: FTP directory listing
Florian Fuessl wrote: Hi, why does a directory listing on: wget ftp://ftp.mcafee.com or any subdirectories of this server not work. Other FTP servers seem to work fine. $ wget -d ftp://ftp.mcafee.com DEBUG output created by Wget 1.6.1-dev on cygwin32. parseurl ("ftp://ftp.mcafee.com") - host ftp.mcafee.com - ftp_type I - opath - dir - file - ndir newpath: / Using `.listing' as listing tmp file. --18:02:59-- ftp://ftp.mcafee.com/ = `.listing' Connecting to ftp.mcafee.com:21... Created fd 3. connected! Logging in as anonymous ... 220 sncwebftp2 Microsoft FTP Service (Version 5.0). [...] Aha it's a Microsoft so called "FTP Server" with DOS dirstyle listing. This listing format is unsupported in wget 1.6. You can: - ask ftp.mcafee.com to change there default to unix listing which is supported - patch the 1.6 code so that it issues an DIRSTYLE command at connection or - use the current 1.7-dev CVS code. Where support for this has been added. Look at the web-site http://sunsite.dk/wget for instruction on how to get it. And please, in the future include debug output when reporting problems !!! Greetings from Bavaria, Florian Fessl -- Med venlig hilsen / Kind regards Hack Kampbjrn [EMAIL PROTECTED] HackLine +45 2031 7799
Re: Strange wget behaviour in mirroring
First of all. Please wrap your lines at column 72 !!! [EMAIL PROTECTED] wrote: I am trying to set up an automated mirroring of a small part of the NAI ftp server. The small part is just the directory of the antivirus updates. I tried to download the directory by automated ftp scripts (something like mget *.*) but that means lots of waste, so I decided to try wget. I found the port to windows of the version 1.6 so I just put the executable in a directory and tried to mirror a directory from a local ftp server here: it was ok. Then I prepared for the real mirroring, I just typed "wget -S ftp://ftp.nai.com/path/" and I noticed that at the end of the log, wget was trying to download index.html while there was no reference to that file in the .listing, so I had a look at the whole log using the -d option (and the attached file is the result of that operation). That's all what you're asking for: the listing of files in that directory. wget converts it to a html index. You most likely want to do something like: $ wget-dev -S -d -r -l 1 -np ftp://ftp.nai.com/pub/antivirus/datfiles/4.x/* or $ wget-dev -S -d ftp://ftp.nai.com/pub/antivirus/datfiles/4.x/* That's it: retrieve all files referenced in the directory listing of /datfiles/ or retrieve all files in the /datfiles/ directory. Note that there seems to be an FTP parsing bug in wget 1.6 wrt MS FTP server. This has been fixed in the 1.7-dev branch. Again look at the web-site for instruccions (http://sunsite.dk/wget) $ wget-dev ftp://ftp.nai.com/pub/antivirus/datfiles/4.x/* --23:08:16-- ftp://ftp.nai.com/pub/antivirus/datfiles/4.x/* = `.listing' Connecting to ftp.nai.com:21... connected! Logging in as anonymous ... Logged in! == SYST ... done.== PWD ... done. == TYPE I ... done. == CWD pub/antivirus/datfiles/4.x ... done. == PORT ... done.== LIST ... done. 0K - . 23:08:25 (54.00 KB/s) - `.listing' saved [1106] Removed `.listing'. The sizes do not match (local 33580) -- retrieving. --23:08:25-- ftp://ftp.nai.com/pub/antivirus/datfiles/4.x/41054106.upd = `41054106.upd' == CWD not required. == PORT ... done.== RETR 41054106.upd ... done. Length: 109,312 0K - .. ... Now I don't know if this is a bug or if there is the usual simple and big mistake I am doing, so I thought to email someone... I tried to look for the mailing list archives on the web but it seems I found an old link, so I thought to write here. I must say that I also thought that it was a problem of the win32 port of wget so I downloaded the tarball and compiled in a redhat 6.2 machine here but the result is exactely the same. I also tried to download some files in the listing, to see if the error was because of read permission, and the files were downloaded correctly, and also the .listing is created correctly, so I don't really know what to do. I hope I am not making you waste lots of time and excuse me for this terrible english, and obviously thanks a lot. Emiliano Name: wget-log wget-logType: unspecified type (application/octet-stream) Encoding: base64 -- Med venlig hilsen / Kind regards Hack Kampbjrn [EMAIL PROTECTED] HackLine +45 2031 7799
Re: f/up bug
Clayton Vernon wrote: Hack- While it now "seems" to parse correctly, and while it displays "TYPE A" in its download dialog, it does NOT actually download the file in ASCII format, but in binary. Clayton I have been testing this against MS so called "FTP Server". And I to get the same file no matter which type I use. I have tried files with CRLF line endings and files with NL line endings. Quoting from RFC959 3.1.1.1. ASCII TYPE This is the default type and must be accepted by all FTP implementations. It is intended primarily for the transfer of text files, except when both hosts would find the EBCDIC type more convenient. The sender converts the data from an internal character representation to the standard 8-bit NVT-ASCII representation (see the Telnet specification). The receiver will convert the data from the standard form to his own internal form. In accordance with the NVT standard, the CRLF sequence should be used where necessary to denote the end of a line of text. (See the discussion of file structure at the end of the Section on Data Representation and Storage.) Using the standard NVT-ASCII representation means that data must be interpreted as 8-bit bytes. The Format parameter for ASCII and EBCDIC types is discussed below. There seems to be two problems here: 1. MS FTP Server always sends the file in binary mode. 2. wget always saves the file in binary mode. First I try on a MS FTP Server (look at how the byte count is the same in both cases, this is expected if the file has only CRLF line endings but we have the same result with binary files) $ wget-dev -S -d -O readme-A.htm ftp://ftp.nai.com/pub/antivirus/readme.htm\;type=A DEBUG output created by Wget 1.7-dev on cygwin32. [...] 220 sncwebftp2 Microsoft FTP Service (Version 5.0). [...] 200 PORT command successful. -- RETR readme.htm 150 Opening ASCII mode data connection for readme.htm(592 bytes). Created socket fd 6. Length: 592 0K -[100%] Closing fd 6 Closing fd 5 226 Transfer complete. 23:27:36 (8.26 KB/s) - `readme-A.htm' saved [592] Closing fd 4 $ wget-dev -S -d -O readme-I.htm ftp://ftp.nai.com/pub/antivirus/readme.htm\;type=I DEBUG output created by Wget 1.7-dev on cygwin32. [...] 220 sncwebftp2 Microsoft FTP Service (Version 5.0). [...] 200 PORT command successful. -- RETR readme.htm 150 Opening BINARY mode data connection for readme.htm(592 bytes). Created socket fd 6. Length: 592 0K -[100%] Closing fd 6 Closing fd 5 226 Transfer complete. 23:30:15 (28.91 KB/s) - `readme-I.htm' saved [592] Closing fd 4 Now on a Unix FTP server: (look at how the byte count is different in this case) $ wget -S -d -O wget-A.html ftp://sunsite.dk/projects/wget/wget.html\;type=A DEBUG output created by Wget 1.6.1-dev on cygwin32. [...] 220 ProFTPD 1.2.0rc3 Server (SunSITE Denmark) [sunsite.dk] [...] 200 PORT command successful. -- RETR wget.html 150 Opening ASCII mode data connection for wget.html (5689 bytes). Created socket fd 6. Length: 5,689 0K - . [102%] Closing fd 6 Closing fd 5 226 Transfer complete. 23:50:32 (6.35 KB/s) - `wget-A.html' saved [5804] ]$ wget -S -d -O wget-I.html ftp://sunsite.dk/projects/wget/wget.html\;type=I DEBUG output created by Wget 1.6.1-dev on cygwin32. [...] 220 ProFTPD 1.2.0rc3 Server (SunSITE Denmark) [sunsite.dk] [...] 200 PORT command successful. -- RETR wget.html 150 Opening BINARY mode data connection for wget.html (5689 bytes). Created socket fd 6. Length: 5,689 0K - . [100%] Closing fd 6 Closing fd 5 226 Transfer complete. 23:50:47 (3.91 KB/s) - `wget-I.html' saved [5689] $ ls -als total 13 2 drwxrwxrwx 2 administ Administ 4096 Feb 11 00:01 . 4 drwxrwxrwx 13 administ Administ 8192 Feb 10 20:33 .. 0 -rw-rw-rw- 1 hack Administ 592 Feb 10 23:27 readme-A.htm 0 -rw-rw-rw- 1 hack Administ 592 Feb 10 23:30 readme-I.htm 3 -rw-rw-rw- 1 hack Administ 5804 Feb 10 23:50 wget-A.html 3 -rw-rw-rw- 1 hack Administ 5689 Feb 10 23:50 wget-I.html -Original Message- From: Hack Kampbjrn [mailto:[EMAIL PROTECTED]] Sent: Sunday, February 04, 2001 11:38 AM To: Clayton Vernon Cc: [EMAIL PROTECTED] Subject: Re: simple ?/bug Clayton Vernon wrote: Sirs: Pardon my naivete, but I can't get the ASCII mode FTP to work because my shell thinks the ';' delimits the command. I can't put the entire arg to wget in quotes because it then thinks the ';type=a' is a part of the URL. And it is! wget will parse it correctly. Obs the debug output doesn't include the ftp
Re: Design issue
Herold Heiko wrote: I think the most straightforward mapping would also be the most attractive: ftp/site/dir/file http/site/dir/file Wget should certainly have an option to make it behave this way. In fact, I'd prefer it to behave that way by default, for the reasons you mention, and introduce an option to leave off the protocol. I agree. What about https ? What about answering on more than one port like java.sun.com used to do where :80 had a java menu and :81 not. This is a bad example as it was mostly the same web-site The files could be either in a separate https directory (logically more correct) or reside in the http directory in order to minimize ../../../../dir/dir/dir/something url rewriting (since I suppose those pages could share lots of inline pics and other links with the http structure). Speaking of https, I got exactly one report (in private mail) of successfully testing of the windows ssl enabled binary, nothing else. Could you commit the patch as http://www.mail-archive.com/wget@sunsite.dk/msg00142.html ? The changes in gen_sslfunc.c could be needed anyway for other operating systems (the are mirrored from similar code in sysdep.h and http.c, although I just noticed a inconditional include of time.h in ftpparse.c), while the changes in the VC makefile are as default commented out. Heiko -- -- PREVINET S.p.A.[EMAIL PROTECTED] -- Via Ferretto, 1 ph x39-041-5907073 -- I-31021 Mogliano V.to (TV) fax x39-041-5907087 -- ITALY Hack 8-)