Re: Release schedule?

2005-04-05 Thread Mauro Tortonesi
On Monday 04 April 2005 04:45 pm, you wrote:
 [I previously sent this to [EMAIL PROTECTED] with no response.
 This is probably a better forum for this.]


 There has obviously been a great deal of work recently on wget, and
 a new release seems to be on the horizon.  I have some patches to add
 functionality, but I've not taken the time to clean them up and send
 it in.  But of course, I really want to see it in the next release.
 So... Is there a plan for the upcoming release?  When might a feature
 freeze take place?

sorry. it's all my fault. i have been ***EXTREMELY*** busy lately working on a 
research project and i haven't been working on wget as much as i wanted. i 
cannot but apologize. i'll try to do my best to catch up ASAP.

anyway, i just released the first alpha 1 of wget 1.10:

ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-alpha1.tar.gz
ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-alpha1.tar.bz2

and i was thinking about releasing wget before the end of april, after we have 
performed some tests on the 1.10 code.

so, the official feature freeze would be sunday, april 9th. but, since the 
recenly integrated LFS feature seems to be very useful and there's not been a 
release of wget in ages, unless we have a major reason (like fixing a serious 
bug or integrating an extremely cool and widely used feature) i think we 
should really focus on testing and bugfixing at this point.

WRT your patches, please post them on the [EMAIL PROTECTED] mailing list 
with some comments on what they do and especially why you think they are 
needed. don't bother to clean the code yet, as they might be rejected or 
cleaned up by other developers.

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
Institute of Human  Machine Cognition   http://www.ihmc.us
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: wget 1.10 alpha 1

2005-04-05 Thread FUJISHIMA Satsuki
Two points:
o some junks are archived. (po/*.gmo and windows/*~)
o string_t remains in src/Makefile.in (does not build)
Otherwise it looks OK.

~/cvs/wget$ diff -xCVS -ur . /tmp/wget-1.10-alpha1/
Only in /tmp/wget-1.10-alpha1/: Branches
Only in /tmp/wget-1.10-alpha1/: configure.bat
Only in /tmp/wget-1.10-alpha1/doc: sample.wgetrc.munged_for_texi_inclusion
Only in /tmp/wget-1.10-alpha1/doc: wget.info
Only in /tmp/wget-1.10-alpha1/: ftppasswd.patch
Only in /tmp/wget-1.10-alpha1/po: bg.gmo
Only in /tmp/wget-1.10-alpha1/po: ca.gmo
Only in /tmp/wget-1.10-alpha1/po: cs.gmo
Only in /tmp/wget-1.10-alpha1/po: da.gmo
Only in /tmp/wget-1.10-alpha1/po: de.gmo
Only in /tmp/wget-1.10-alpha1/po: el.gmo
Only in /tmp/wget-1.10-alpha1/po: en_GB.gmo
Only in /tmp/wget-1.10-alpha1/po: es.gmo
Only in /tmp/wget-1.10-alpha1/po: et.gmo
Only in /tmp/wget-1.10-alpha1/po: eu.gmo
Only in /tmp/wget-1.10-alpha1/po: fi.gmo
Only in /tmp/wget-1.10-alpha1/po: fr.gmo
Only in /tmp/wget-1.10-alpha1/po: gl.gmo
Only in /tmp/wget-1.10-alpha1/po: he.gmo
Only in /tmp/wget-1.10-alpha1/po: hr.gmo
Only in /tmp/wget-1.10-alpha1/po: hu.gmo
Only in /tmp/wget-1.10-alpha1/po: it.gmo
Only in /tmp/wget-1.10-alpha1/po: ja.gmo
Only in /tmp/wget-1.10-alpha1/po: nl.gmo
Only in /tmp/wget-1.10-alpha1/po: no.gmo
Only in /tmp/wget-1.10-alpha1/po: pl.gmo
Only in /tmp/wget-1.10-alpha1/po: pt_BR.gmo
Only in /tmp/wget-1.10-alpha1/po: ro.gmo
Only in /tmp/wget-1.10-alpha1/po: ru.gmo
Only in /tmp/wget-1.10-alpha1/po: sk.gmo
Only in /tmp/wget-1.10-alpha1/po: sl.gmo
Only in /tmp/wget-1.10-alpha1/po: sr.gmo
Only in /tmp/wget-1.10-alpha1/po: sv.gmo
Only in /tmp/wget-1.10-alpha1/po: tr.gmo
Only in /tmp/wget-1.10-alpha1/po: uk.gmo
Only in /tmp/wget-1.10-alpha1/po: zh_CN.gmo
Only in /tmp/wget-1.10-alpha1/po: zh_TW.gmo
diff -xCVS -ur ./src/version.c /tmp/wget-1.10-alpha1/src/version.c
--- ./src/version.c Thu Mar 18 04:05:56 2004
+++ /tmp/wget-1.10-alpha1/src/version.c Tue Apr  5 12:44:10 2005
@@ -1 +1 @@
-char *version_string = 1.9+cvs-dev;
+char *version_string = 1.10-alpha1;
Only in /tmp/wget-1.10-alpha1/windows: ChangeLog~
Only in /tmp/wget-1.10-alpha1/windows: Makefile.src.bor~
Only in /tmp/wget-1.10-alpha1/windows: Makefile.src.mingw~
Only in /tmp/wget-1.10-alpha1/windows: Makefile.src~
Only in /tmp/wget-1.10-alpha1/windows: Makefile.watcom~
Only in /tmp/wget-1.10-alpha1/windows: wget.dep~


RE: Character encoding

2005-04-05 Thread Alan Hunter

The solution is to explicitly set the character encoding to utf-8. I do this
in the aspx file's head section and it works fine. 

This is kinda wierd though as with an aspx file, it seems that dotnet will
always insert this charset header for you by default (you can see this by
running wget in debug mode, withough setting the charset in the head
section). However this does not work when using wget. It does work in normal
browsers though as aspx files with utf-8 chars obvioulsy display fine.

Anyway problem solved, just thought I'd let you know.


-Original Message-
From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
Sent: March 31, 2005 3:19 PM
To: Alan Hunter
Cc: 'wget@sunsite.dk'
Subject: Re: Character encoding


I'm not sure what causes this problem, but I suspect it does not come
from Wget doing something wrong.  That Notepad opens the file
correctly is indicative enough.

Maybe those browsers don't understand UTF-8 (or other) encoding of
Unicode when the file is opened on-disk?


Keep session cookies command line switch considered invalid

2005-04-05 Thread Oliver Cole
Wget seems to consider --keep-session-cookies not to be a valid command
line switch, even though its documented in the man page.

Eg,

raptor$ wget --load-cookies cookies.txt --save-cookies cookies.txt
--keep-session-cookies --post-file=post.txt
https://www.memset.com/login.php
wget: unrecognized option `--keep-session-cookies'
Usage: wget [OPTION]... [URL]...

Try `wget --help' for more options.

raptor$ wget --keep-session-cookies
wget: unrecognized option `--keep-session-cookies'
Usage: wget [OPTION]... [URL]...

Try `wget --help' for more options.

raptor$ wget -V | head -n 1
GNU Wget 1.9.1

This problem can be reproduced on another machine, which is:

[EMAIL PROTECTED] oli]$ wget -V | head -n 1
GNU Wget 1.9+cvs-stable (Red Hat modified)

What do you think... is it a bug??

Oli



wget spans hosts when it shouldn't and fails to retrieve dirs starting with a dot...

2005-04-05 Thread Jörn Nettingsmeier
hi everyone !
i'm trying to set up a website monitoring tool for a university research 
project. the idea is to use wget to archive politician's websites once a 
week to analyse their campaigns in the last 4 weeks before the election.

i have hit a few snags, and i would welcome comments.
my wget is a binary release that was shipped with suse linux 9.2 (GNU 
Wget 1.9+cvs-dev), architecture is i386.

[1]
wget spans hosts when it shouldn't:
wget -r -l inf --convert-links -N --backup-converted 
http://www.karl-kress.de

yields
www.cdu.de
www.cdu-dormagen.de
www.cdu-grevenbroich.de
www.cdukapellen.de
www.cdu-kreisneuss.de
www.cduneukirchen.de
www.cdu-nrw.de
www.cdu-nrw-fraktion.de
www.cdu-rommerskirchen.de
www.cinelux.de
www.dormagen.de
www.grevenbroich.de
www.karl-kress.de
www.khf-zons.de
www.ngz-online.de
www.rheinischer-anzeiger.de
www.rommerskirchen.de
www.schaufenster-online.de
www.wz-newsline.de
www.zons.de
although the non-local host directories only contain the file that was 
linked to from the original site, and not a full recursive retrieval.

still, i would rather it stayed on the original host only, and iiuc, 
that's how it's supposed to be. i could not find any funky stuff in the 
website that could trigger this behaviour...


[2]
wget seems to choke on directories that start with a dot. i guess it 
thinks they are references to external pages and does not download links 
containing such directory names.

there is a site i need to mirror that uses a funky cms that has its 
content below a /.net/ directory, and recursive download fails:

wget -r -l inf --convert-links -N --backup-converted 
http://www.albrecht-in-den-landtag.de/

--16:53:48--  http://www.albrecht-in-den-landtag.de/
   = `www.albrecht-in-den-landtag.de/index.html'
Resolving www.albrecht-in-den-landtag.de... 62.26.127.197
Connecting to www.albrecht-in-den-landtag.de|62.26.127.197|:80... connected.
HTTP request sent, awaiting response... 302 Object moved
Location: /.net/html/-1/welcome.html [following]
--16:53:48-- 
http://www.albrecht-in-den-landtag.de/.net/html/-1/welcome.html
   = `www.albrecht-in-den-landtag.de/.net/html/-1/welcome.html'
Reusing existing connection to www.albrecht-in-den-landtag.de:80.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

[   =] 97,674 
  167.93K/s

Last-modified header missing -- time-stamps turned off.
16:53:49 (167.59 KB/s) - 
`www.albrecht-in-den-landtag.de/.net/html/-1/welcome.html' saved [97,674]

Loading robots.txt; please ignore errors.
--16:53:49--  http://www.albrecht-in-den-landtag.de/robots.txt
   = `www.albrecht-in-den-landtag.de/robots.txt'
Connecting to www.albrecht-in-den-landtag.de|62.26.127.197|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 487 [text/plain]
100%[=] 487 
   --.--K/s

16:53:49 (23.65 KB/s) - `www.albrecht-in-den-landtag.de/robots.txt'
saved [487/487]
FINISHED --16:53:49--
Downloaded: 98,161 bytes in 2 files
Converting www.albrecht-in-den-landtag.de/.net/html/-1/welcome.html... 1-412
Converted 1 files in 0.06 seconds.
as you can see, it stops after the first page.

[3]
wget does not parse css stylesheets and consequently does not retrieve 
url() references, which leads to missing background graphics on some sites.

this is a minor issue, but since it should be simple to fix, i wonder 
whether you would accept a patch if i find my way around the wget source...


any comments?
best regards,
jörn
ps: please retain the cc: list. thanks.
--
Jörn Nettingsmeier, EDV-Administrator
Institut für Politikwissenschaft
Universität Duisburg-Essen, Standort Duisburg
Mail: [EMAIL PROTECTED], Telefon: 0203/379-2736


cvs compile problem...

2005-04-05 Thread Jörn Nettingsmeier
hi everybody!
i'm new to wget, and can't compile the current cvs:
i did a cvs checkout, make -f Makefile.cvs and ./configure as usual.
make chokes:
/bin/sh ../libtool --mode=link gcc -O2 -Wall -Wno-implicit  -o wget 
cmpt.o connect.o convert.o cookies.o ftp.o ftp-basic.o ftp-ls.o 
ftp-opie.o  hash.o host.o html-parse.o html-url.o http.o init.o log.o 
main.o gen-md5.o netrc.o progress.o recur.o res.o retr.o safe-ctype.o 
snprintf.o gen_sslfunc.o url.o utils.oversion.o xmalloc.o string_t.o 
-lssl -lcrypto -ldl
mkdir .libs
gcc -O2 -Wall -Wno-implicit -o wget cmpt.o connect.o convert.o cookies.o 
ftp.o ftp-basic.o ftp-ls.o ftp-opie.o hash.o host.o html-parse.o 
html-url.o http.oinit.o log.o main.o gen-md5.o netrc.o progress.o 
recur.o res.o retr.o safe-ctype.o snprintf.o gen_sslfunc.o url.o utils.o 
version.o xmalloc.o string_t.o  -lssl -lcrypto -ldl
gcc: string_t.o: No such file or directory
make[1]: *** [wget] Error 1
make[1]: Leaving directory `/home/nettings/wget-cvs/wget/src'
make: *** [src] Error 2

string_t.c seems to be missing.
system is suse linux 9.2.
any hints? thanks in advance,
jörn
ps: please keep the cc: list. thx
--
Jörn Nettingsmeier, EDV-Administrator
Institut für Politikwissenschaft
Universität Duisburg-Essen, Standort Duisburg
Mail: [EMAIL PROTECTED], Telefon: 0203/379-2736


Re: cvs compile problem...

2005-04-05 Thread Jörn Nettingsmeier
Jörn Nettingsmeier wrote:
hi everybody!
i'm new to wget, and can't compile the current cvs:
i did a cvs checkout, make -f Makefile.cvs and ./configure as usual.
make chokes:
/bin/sh ../libtool --mode=link gcc -O2 -Wall -Wno-implicit  -o wget 
cmpt.o connect.o convert.o cookies.o ftp.o ftp-basic.o ftp-ls.o 
ftp-opie.o  hash.o host.o html-parse.o html-url.o http.o init.o log.o 
main.o gen-md5.o netrc.o progress.o recur.o res.o retr.o safe-ctype.o 
snprintf.o gen_sslfunc.o url.o utils.oversion.o xmalloc.o string_t.o 
-lssl -lcrypto -ldl
mkdir .libs
gcc -O2 -Wall -Wno-implicit -o wget cmpt.o connect.o convert.o cookies.o 
ftp.o ftp-basic.o ftp-ls.o ftp-opie.o hash.o host.o html-parse.o 
html-url.o http.oinit.o log.o main.o gen-md5.o netrc.o progress.o 
recur.o res.o retr.o safe-ctype.o snprintf.o gen_sslfunc.o url.o utils.o 
version.o xmalloc.o string_t.o  -lssl -lcrypto -ldl
gcc: string_t.o: No such file or directory
make[1]: *** [wget] Error 1
make[1]: Leaving directory `/home/nettings/wget-cvs/wget/src'
make: *** [src] Error 2

string_t.c seems to be missing.
oops. i just came across this message:
http://www.mail-archive.com/wget%40sunsite.dk/msg07380.html
and after removing all references to string_t from src/Makefile, it now 
compiles cleanly.

sorry for the noise.


Re: wget spans hosts when it shouldn't and fails to retrieve dirs starting with a dot...

2005-04-05 Thread Jörn Nettingsmeier
Jörn Nettingsmeier wrote:
hi everyone !
i'm trying to set up a website monitoring tool for a university research 
project. the idea is to use wget to archive politician's websites once a 
week to analyse their campaigns in the last 4 weeks before the election.

i have hit a few snags, and i would welcome comments.
my wget is a binary release that was shipped with suse linux 9.2 (GNU 
Wget 1.9+cvs-dev), architecture is i386.
i just confirmed all three issues with latest cvs.
[1]
wget spans hosts when it shouldn't:
[2]
wget seems to choke on directories that start with a dot. i guess it 
thinks they are references to external pages and does not download links 
containing such directory names.

[3]
wget does not parse css stylesheets and consequently does not retrieve 
url() references, which leads to missing background graphics on some sites.

ps: please retain the cc: list. thanks.

regards,
jörn


Re: wget 1.10 alpha 1

2005-04-05 Thread Mauro Tortonesi
On Tuesday 05 April 2005 03:16 am, FUJISHIMA Satsuki wrote:
 Two points:
 o some junks are archived. (po/*.gmo and windows/*~)

sorry. i am really spoiled by automake, which automatically deletes junk files 
from the final distribution.

 o string_t remains in src/Makefile.in (does not build)
 Otherwise it looks OK.

just fixed in both cvs and tarball. thanks.

the bottom line is: i shouldn't do releases at 2:00 am. when you're so tired 
after a long day of hard work it's really too easy to screw things up. 
anyway, i've just re-released the 1.10 alpha1 tarball with fixes to makefiles 
and no junk files. please, give it a second try.

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
Institute of Human  Machine Cognition   http://www.ihmc.us
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


File rejection is not working

2005-04-05 Thread Gerald Wheeler


The "-R" option is not working in wget 1.9.1 for anything but specifically-hardcoded filenames..

file[Nn]ames such as [Tt]hese are simply ignored...

Please respond... Do not delete my email address as I am not a subscriber... Yet

Thanks

Jerry