Re: Release schedule?
On Monday 04 April 2005 04:45 pm, you wrote: [I previously sent this to [EMAIL PROTECTED] with no response. This is probably a better forum for this.] There has obviously been a great deal of work recently on wget, and a new release seems to be on the horizon. I have some patches to add functionality, but I've not taken the time to clean them up and send it in. But of course, I really want to see it in the next release. So... Is there a plan for the upcoming release? When might a feature freeze take place? sorry. it's all my fault. i have been ***EXTREMELY*** busy lately working on a research project and i haven't been working on wget as much as i wanted. i cannot but apologize. i'll try to do my best to catch up ASAP. anyway, i just released the first alpha 1 of wget 1.10: ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-alpha1.tar.gz ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-alpha1.tar.bz2 and i was thinking about releasing wget before the end of april, after we have performed some tests on the 1.10 code. so, the official feature freeze would be sunday, april 9th. but, since the recenly integrated LFS feature seems to be very useful and there's not been a release of wget in ages, unless we have a major reason (like fixing a serious bug or integrating an extremely cool and widely used feature) i think we should really focus on testing and bugfixing at this point. WRT your patches, please post them on the [EMAIL PROTECTED] mailing list with some comments on what they do and especially why you think they are needed. don't bother to clean the code yet, as they might be rejected or cleaned up by other developers. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute of Human Machine Cognition http://www.ihmc.us Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.10 alpha 1
Two points: o some junks are archived. (po/*.gmo and windows/*~) o string_t remains in src/Makefile.in (does not build) Otherwise it looks OK. ~/cvs/wget$ diff -xCVS -ur . /tmp/wget-1.10-alpha1/ Only in /tmp/wget-1.10-alpha1/: Branches Only in /tmp/wget-1.10-alpha1/: configure.bat Only in /tmp/wget-1.10-alpha1/doc: sample.wgetrc.munged_for_texi_inclusion Only in /tmp/wget-1.10-alpha1/doc: wget.info Only in /tmp/wget-1.10-alpha1/: ftppasswd.patch Only in /tmp/wget-1.10-alpha1/po: bg.gmo Only in /tmp/wget-1.10-alpha1/po: ca.gmo Only in /tmp/wget-1.10-alpha1/po: cs.gmo Only in /tmp/wget-1.10-alpha1/po: da.gmo Only in /tmp/wget-1.10-alpha1/po: de.gmo Only in /tmp/wget-1.10-alpha1/po: el.gmo Only in /tmp/wget-1.10-alpha1/po: en_GB.gmo Only in /tmp/wget-1.10-alpha1/po: es.gmo Only in /tmp/wget-1.10-alpha1/po: et.gmo Only in /tmp/wget-1.10-alpha1/po: eu.gmo Only in /tmp/wget-1.10-alpha1/po: fi.gmo Only in /tmp/wget-1.10-alpha1/po: fr.gmo Only in /tmp/wget-1.10-alpha1/po: gl.gmo Only in /tmp/wget-1.10-alpha1/po: he.gmo Only in /tmp/wget-1.10-alpha1/po: hr.gmo Only in /tmp/wget-1.10-alpha1/po: hu.gmo Only in /tmp/wget-1.10-alpha1/po: it.gmo Only in /tmp/wget-1.10-alpha1/po: ja.gmo Only in /tmp/wget-1.10-alpha1/po: nl.gmo Only in /tmp/wget-1.10-alpha1/po: no.gmo Only in /tmp/wget-1.10-alpha1/po: pl.gmo Only in /tmp/wget-1.10-alpha1/po: pt_BR.gmo Only in /tmp/wget-1.10-alpha1/po: ro.gmo Only in /tmp/wget-1.10-alpha1/po: ru.gmo Only in /tmp/wget-1.10-alpha1/po: sk.gmo Only in /tmp/wget-1.10-alpha1/po: sl.gmo Only in /tmp/wget-1.10-alpha1/po: sr.gmo Only in /tmp/wget-1.10-alpha1/po: sv.gmo Only in /tmp/wget-1.10-alpha1/po: tr.gmo Only in /tmp/wget-1.10-alpha1/po: uk.gmo Only in /tmp/wget-1.10-alpha1/po: zh_CN.gmo Only in /tmp/wget-1.10-alpha1/po: zh_TW.gmo diff -xCVS -ur ./src/version.c /tmp/wget-1.10-alpha1/src/version.c --- ./src/version.c Thu Mar 18 04:05:56 2004 +++ /tmp/wget-1.10-alpha1/src/version.c Tue Apr 5 12:44:10 2005 @@ -1 +1 @@ -char *version_string = 1.9+cvs-dev; +char *version_string = 1.10-alpha1; Only in /tmp/wget-1.10-alpha1/windows: ChangeLog~ Only in /tmp/wget-1.10-alpha1/windows: Makefile.src.bor~ Only in /tmp/wget-1.10-alpha1/windows: Makefile.src.mingw~ Only in /tmp/wget-1.10-alpha1/windows: Makefile.src~ Only in /tmp/wget-1.10-alpha1/windows: Makefile.watcom~ Only in /tmp/wget-1.10-alpha1/windows: wget.dep~
RE: Character encoding
The solution is to explicitly set the character encoding to utf-8. I do this in the aspx file's head section and it works fine. This is kinda wierd though as with an aspx file, it seems that dotnet will always insert this charset header for you by default (you can see this by running wget in debug mode, withough setting the charset in the head section). However this does not work when using wget. It does work in normal browsers though as aspx files with utf-8 chars obvioulsy display fine. Anyway problem solved, just thought I'd let you know. -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: March 31, 2005 3:19 PM To: Alan Hunter Cc: 'wget@sunsite.dk' Subject: Re: Character encoding I'm not sure what causes this problem, but I suspect it does not come from Wget doing something wrong. That Notepad opens the file correctly is indicative enough. Maybe those browsers don't understand UTF-8 (or other) encoding of Unicode when the file is opened on-disk?
Keep session cookies command line switch considered invalid
Wget seems to consider --keep-session-cookies not to be a valid command line switch, even though its documented in the man page. Eg, raptor$ wget --load-cookies cookies.txt --save-cookies cookies.txt --keep-session-cookies --post-file=post.txt https://www.memset.com/login.php wget: unrecognized option `--keep-session-cookies' Usage: wget [OPTION]... [URL]... Try `wget --help' for more options. raptor$ wget --keep-session-cookies wget: unrecognized option `--keep-session-cookies' Usage: wget [OPTION]... [URL]... Try `wget --help' for more options. raptor$ wget -V | head -n 1 GNU Wget 1.9.1 This problem can be reproduced on another machine, which is: [EMAIL PROTECTED] oli]$ wget -V | head -n 1 GNU Wget 1.9+cvs-stable (Red Hat modified) What do you think... is it a bug?? Oli
wget spans hosts when it shouldn't and fails to retrieve dirs starting with a dot...
hi everyone ! i'm trying to set up a website monitoring tool for a university research project. the idea is to use wget to archive politician's websites once a week to analyse their campaigns in the last 4 weeks before the election. i have hit a few snags, and i would welcome comments. my wget is a binary release that was shipped with suse linux 9.2 (GNU Wget 1.9+cvs-dev), architecture is i386. [1] wget spans hosts when it shouldn't: wget -r -l inf --convert-links -N --backup-converted http://www.karl-kress.de yields www.cdu.de www.cdu-dormagen.de www.cdu-grevenbroich.de www.cdukapellen.de www.cdu-kreisneuss.de www.cduneukirchen.de www.cdu-nrw.de www.cdu-nrw-fraktion.de www.cdu-rommerskirchen.de www.cinelux.de www.dormagen.de www.grevenbroich.de www.karl-kress.de www.khf-zons.de www.ngz-online.de www.rheinischer-anzeiger.de www.rommerskirchen.de www.schaufenster-online.de www.wz-newsline.de www.zons.de although the non-local host directories only contain the file that was linked to from the original site, and not a full recursive retrieval. still, i would rather it stayed on the original host only, and iiuc, that's how it's supposed to be. i could not find any funky stuff in the website that could trigger this behaviour... [2] wget seems to choke on directories that start with a dot. i guess it thinks they are references to external pages and does not download links containing such directory names. there is a site i need to mirror that uses a funky cms that has its content below a /.net/ directory, and recursive download fails: wget -r -l inf --convert-links -N --backup-converted http://www.albrecht-in-den-landtag.de/ --16:53:48-- http://www.albrecht-in-den-landtag.de/ = `www.albrecht-in-den-landtag.de/index.html' Resolving www.albrecht-in-den-landtag.de... 62.26.127.197 Connecting to www.albrecht-in-den-landtag.de|62.26.127.197|:80... connected. HTTP request sent, awaiting response... 302 Object moved Location: /.net/html/-1/welcome.html [following] --16:53:48-- http://www.albrecht-in-den-landtag.de/.net/html/-1/welcome.html = `www.albrecht-in-den-landtag.de/.net/html/-1/welcome.html' Reusing existing connection to www.albrecht-in-den-landtag.de:80. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] [ =] 97,674 167.93K/s Last-modified header missing -- time-stamps turned off. 16:53:49 (167.59 KB/s) - `www.albrecht-in-den-landtag.de/.net/html/-1/welcome.html' saved [97,674] Loading robots.txt; please ignore errors. --16:53:49-- http://www.albrecht-in-den-landtag.de/robots.txt = `www.albrecht-in-den-landtag.de/robots.txt' Connecting to www.albrecht-in-den-landtag.de|62.26.127.197|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 487 [text/plain] 100%[=] 487 --.--K/s 16:53:49 (23.65 KB/s) - `www.albrecht-in-den-landtag.de/robots.txt' saved [487/487] FINISHED --16:53:49-- Downloaded: 98,161 bytes in 2 files Converting www.albrecht-in-den-landtag.de/.net/html/-1/welcome.html... 1-412 Converted 1 files in 0.06 seconds. as you can see, it stops after the first page. [3] wget does not parse css stylesheets and consequently does not retrieve url() references, which leads to missing background graphics on some sites. this is a minor issue, but since it should be simple to fix, i wonder whether you would accept a patch if i find my way around the wget source... any comments? best regards, jörn ps: please retain the cc: list. thanks. -- Jörn Nettingsmeier, EDV-Administrator Institut für Politikwissenschaft Universität Duisburg-Essen, Standort Duisburg Mail: [EMAIL PROTECTED], Telefon: 0203/379-2736
cvs compile problem...
hi everybody! i'm new to wget, and can't compile the current cvs: i did a cvs checkout, make -f Makefile.cvs and ./configure as usual. make chokes: /bin/sh ../libtool --mode=link gcc -O2 -Wall -Wno-implicit -o wget cmpt.o connect.o convert.o cookies.o ftp.o ftp-basic.o ftp-ls.o ftp-opie.o hash.o host.o html-parse.o html-url.o http.o init.o log.o main.o gen-md5.o netrc.o progress.o recur.o res.o retr.o safe-ctype.o snprintf.o gen_sslfunc.o url.o utils.oversion.o xmalloc.o string_t.o -lssl -lcrypto -ldl mkdir .libs gcc -O2 -Wall -Wno-implicit -o wget cmpt.o connect.o convert.o cookies.o ftp.o ftp-basic.o ftp-ls.o ftp-opie.o hash.o host.o html-parse.o html-url.o http.oinit.o log.o main.o gen-md5.o netrc.o progress.o recur.o res.o retr.o safe-ctype.o snprintf.o gen_sslfunc.o url.o utils.o version.o xmalloc.o string_t.o -lssl -lcrypto -ldl gcc: string_t.o: No such file or directory make[1]: *** [wget] Error 1 make[1]: Leaving directory `/home/nettings/wget-cvs/wget/src' make: *** [src] Error 2 string_t.c seems to be missing. system is suse linux 9.2. any hints? thanks in advance, jörn ps: please keep the cc: list. thx -- Jörn Nettingsmeier, EDV-Administrator Institut für Politikwissenschaft Universität Duisburg-Essen, Standort Duisburg Mail: [EMAIL PROTECTED], Telefon: 0203/379-2736
Re: cvs compile problem...
Jörn Nettingsmeier wrote: hi everybody! i'm new to wget, and can't compile the current cvs: i did a cvs checkout, make -f Makefile.cvs and ./configure as usual. make chokes: /bin/sh ../libtool --mode=link gcc -O2 -Wall -Wno-implicit -o wget cmpt.o connect.o convert.o cookies.o ftp.o ftp-basic.o ftp-ls.o ftp-opie.o hash.o host.o html-parse.o html-url.o http.o init.o log.o main.o gen-md5.o netrc.o progress.o recur.o res.o retr.o safe-ctype.o snprintf.o gen_sslfunc.o url.o utils.oversion.o xmalloc.o string_t.o -lssl -lcrypto -ldl mkdir .libs gcc -O2 -Wall -Wno-implicit -o wget cmpt.o connect.o convert.o cookies.o ftp.o ftp-basic.o ftp-ls.o ftp-opie.o hash.o host.o html-parse.o html-url.o http.oinit.o log.o main.o gen-md5.o netrc.o progress.o recur.o res.o retr.o safe-ctype.o snprintf.o gen_sslfunc.o url.o utils.o version.o xmalloc.o string_t.o -lssl -lcrypto -ldl gcc: string_t.o: No such file or directory make[1]: *** [wget] Error 1 make[1]: Leaving directory `/home/nettings/wget-cvs/wget/src' make: *** [src] Error 2 string_t.c seems to be missing. oops. i just came across this message: http://www.mail-archive.com/wget%40sunsite.dk/msg07380.html and after removing all references to string_t from src/Makefile, it now compiles cleanly. sorry for the noise.
Re: wget spans hosts when it shouldn't and fails to retrieve dirs starting with a dot...
Jörn Nettingsmeier wrote: hi everyone ! i'm trying to set up a website monitoring tool for a university research project. the idea is to use wget to archive politician's websites once a week to analyse their campaigns in the last 4 weeks before the election. i have hit a few snags, and i would welcome comments. my wget is a binary release that was shipped with suse linux 9.2 (GNU Wget 1.9+cvs-dev), architecture is i386. i just confirmed all three issues with latest cvs. [1] wget spans hosts when it shouldn't: [2] wget seems to choke on directories that start with a dot. i guess it thinks they are references to external pages and does not download links containing such directory names. [3] wget does not parse css stylesheets and consequently does not retrieve url() references, which leads to missing background graphics on some sites. ps: please retain the cc: list. thanks. regards, jörn
Re: wget 1.10 alpha 1
On Tuesday 05 April 2005 03:16 am, FUJISHIMA Satsuki wrote: Two points: o some junks are archived. (po/*.gmo and windows/*~) sorry. i am really spoiled by automake, which automatically deletes junk files from the final distribution. o string_t remains in src/Makefile.in (does not build) Otherwise it looks OK. just fixed in both cvs and tarball. thanks. the bottom line is: i shouldn't do releases at 2:00 am. when you're so tired after a long day of hard work it's really too easy to screw things up. anyway, i've just re-released the 1.10 alpha1 tarball with fixes to makefiles and no junk files. please, give it a second try. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute of Human Machine Cognition http://www.ihmc.us Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
File rejection is not working
The "-R" option is not working in wget 1.9.1 for anything but specifically-hardcoded filenames.. file[Nn]ames such as [Tt]hese are simply ignored... Please respond... Do not delete my email address as I am not a subscriber... Yet Thanks Jerry