forum download, cookies?

2007-09-13 Thread Juhana Sadeharju
A forum has topics which are available only for members. How to use wget for downloading copy of the pages in that case? How to get the proper cookies and how to get wget to use them correctly? I use IE in PC/Windows and wget in a unix computer. I could use Lynx in the unix computer if needed.

Bug in 1.10.2 vs 1.9.1

2006-12-03 Thread Juhana Sadeharju
Hello. Wget 1.10.2 has the following bug compared to version 1.9.1. First, the bin/wgetdir is defined as wget -p -E -k --proxy=off -e robots=off --passive-ftp -o zlogwget`date +%Y%m%d%H%M%S` -r -l 0 -np -U Mozilla --tries=50 --waitretry=10 $@ The download command is wgetdir

url accept/reject? accept scripts

2006-08-04 Thread Juhana Sadeharju
Hello. How do I get wget to ignore urls containing one of the following strings? The --help did not reveal a suitable option, surprisingly. action= printable= redirect= article= returnto= title= I would like to remind about the problems with the existing options: (1) I downloaded an ftp

wget server?

2006-08-04 Thread Juhana Sadeharju
Hello. The following problem occured recently. I started downloading all under directory http://site.edu/projects/software/ Then after a day I found that the subdirectory http://site.edu/projects/software/program/manual/ had a wiki with millions of files. Because I wished that the download

news protocol?

2006-08-04 Thread Juhana Sadeharju
Hello. The TODO lists the following: * Add more protocols (e.g. gopher and news), implementing them in a modular fashion. Do you mean nntp protocol? If yes, I recently wrote an nntp downloader: http://www.funet.fi/~kouhia/nntppull20060409.tar.gz I find it good for news archiving. I now

accepted and excluded?

2006-02-10 Thread Juhana Sadeharju
Hello. How I would type the -A option if I want both .pdf and .PDF files from an ftp site? -A pdf,PDF failed -- only PDF files were downloaded. How I would type -X option if I want multiple subdirectories excluded? -X dir1,dir2 failed -- only one of the given dirs was excluded. (E.g.

wget with a log database?

2005-11-30 Thread Juhana Sadeharju
Hello. I would like to have a database within wget. The database would let wget know what it has downloaded earlier. Wget could download only new and changed files, and could continue the download without having the old downloadings in my disk. The database would also be accessed by other

wget problem

2005-04-04 Thread Juhana Sadeharju
Hello. The following document could not be downloaded at all: http://www.greyc.ensicaen.fr/~dtschump/greycstoration/ If you succeed, please tell me how. I want all the html file and the images. Juhana -- http://music.columbia.edu/mailman/listinfo/linux-graphics-dev for developers of open

Re: Help Needed

2004-11-02 Thread Juhana Sadeharju
Hello. Does wget have a nntp (Usenet newsgroups) support? For example, I might want download all articles between numbers M and N. A date based system could be useful too. We just should agree how these queries are represented to wget. I can dig out an old Usenet news downloader code if wget does

on tilde bug

2004-11-01 Thread Juhana Sadeharju
Hello. I traced the url given at command line, and it looks like there is no difference if one gives ~ or %7E. Is this true? The urls end up to url_parse() which switches ~ (as unsafe) to %7E. If the original url is not used at all as it looks like, then there is no difference. But mysteriously

char 5C problem

2004-11-01 Thread Juhana Sadeharju
Hello. Wget could not download the images of the page http://www.fusionindustries.com/alex/combustion/index.html The image urls have %5C (backslash \) in them. http://www.fusionindustries.com/alex/combustion/small%5C0103%20edgepoint-pressure%20small.png

Tilde bug again

2004-10-16 Thread Juhana Sadeharju
Hello. Has the ~ / %7E bug been always in wget? When it was added to wget? Who wrote the code? I would like to suggest that the person who made this severe bug should immediately fix it back. It does not make sense that we waste time in trying to fix this bug if the person did not use any moment

Directory indecies?

2004-10-16 Thread Juhana Sadeharju
Hello. Why wget generates the following index files? Why so many index files? ftp1.sourceforge.net/gut/index.html ftp1.sourceforge.net/gut/index.html?C=MO=A ftp1.sourceforge.net/gut/index.html?C=MO=D ftp1.sourceforge.net/gut/index.html?C=NO=A ftp1.sourceforge.net/gut/index.html?C=NO=D

img dynsrc not downloaded?

2004-10-16 Thread Juhana Sadeharju
Hello. Wget could not follow dynsrc tags; the mpeg file was not downloaded: pimg dynsrc=Collision.mpg CONTROLS LOOP=1 at http://www.wideopenwest.com/~nkuzmenko7225/Collision.htm Regards, Juhana

xml files not processed?

2004-10-16 Thread Juhana Sadeharju
Hello. When the url http://zeus.fri.uni-lj.si/%7Ealeks/POIS/Kolaborativno%20delo.htm is downloaded with -np -r -l 0 etc., the file http://zeus.fri.uni-lj.si/~aleks/POIS/Kolaborativno delo_files/filelist.xml is downloaded correctly. However, the hrefs in the xml file are not then followed:

Developers here?

2004-10-16 Thread Juhana Sadeharju
Hello. Recent mails has not been replied and CVS may be old. Who are the developers of wget at the moment? I just posted a couple of featureloss reports, but my intend is not to pour the tasks on the current developers. However, without anyone giving hints on what to look at, the features may go

wget scripting?

2004-10-04 Thread Juhana Sadeharju
Hello. I have slightly thought how to make wget more better, possibly. We would need a scripting system so that features can be programmed more easily. One way how to incorporate the scripting to wget would be to re-write wget as a data flow system. Much similar way than OpenGL (www.opengl.org)

compressed html files?

2004-09-23 Thread Juhana Sadeharju
Hello. The file http://www.cs.utah.edu/~gooch/JOT/index.html is compressed and wget could not follow the urls in it. What can be done? Should wget uncompress the compressed *.htm and *.html files? *.asp, *.php?? Juhana

Character coding gives problems

2004-08-20 Thread Juhana Sadeharju
Hello. Char coding of ~ causes problems in downloading. Example: wget -p -E -k --proxy=off -e robots=off --passive-ftp -q -r -l 0 -np http://www.stanford.edu/~dattorro/ However, not all was downloaded. The file machines.html has hrefs http://www.stanford.edu/%7Edattorro/images/calloph.jpg

wget problem: urls behind script

2004-04-16 Thread Juhana Sadeharju
Hello. One wget problem this time. I downloaded all in http://www.planetunreal.com/wod/tutorials/ but most of the files were not downloaded because urls are in the file http://www.planetunreal.com/wod/tutorials/sidebar.js in the following format FItem(Beginner's Guide to UnrealScript,

wget bug: directory overwrite

2004-04-05 Thread Juhana Sadeharju
Hello. Problem: When downloading all in http://udn.epicgames.com/Technical/MyFirstHUD wget overwrites the downloaded MyFirstHUD file with MyFirstHUD directory (which comes later). GNU Wget 1.9.1 wget -k --proxy=off -e robots=off --passive-ftp -q -r -l 0 -np -U Mozilla $@ Solution: Use of -E

Bug report

2004-03-24 Thread Juhana Sadeharju
Hello. This is report on some wget bugs. My wgetdir command looks the following (wget 1.9.1): wget -k --proxy=off -e robots=off --passive-ftp -q -r -l 0 -np -U Mozilla $@ Bugs: Command: wgetdir http://www.directfb.org;. Problem: In file www.directfb.org/index.html the hrefs of type

will mime coding make the site different?

2004-03-13 Thread Juhana Sadeharju
Hello. I downloaded http://agar.csoft.org/index.html with -k option, but the URL http://agar.csoft.org/man.cgi?query=widgetamp;sektion=3 in the file was not converted to relative. (The local filename is man.cgi?query=widgetsektion=3.) Regards, Juhana

Re: not downloading at all, help

2004-02-12 Thread Juhana Sadeharju
--16:59:21-- http://www.maqamworld.com:80/ = `index.html' Connecting to www.maqamworld.com:80... connected! It looks like you have http_proxy=80 in your wgetrc file. I placed use_proxy = off to .wgetrc (which file I did not have earlier) and to ~/wget/etc/wgetrc (which file

not downloading at all, help

2004-02-11 Thread Juhana Sadeharju
Hello. What goes wrong in the following? (I will read replies from the list archives.) % wget http://www.maqamworld.com/ --16:59:21-- http://www.maqamworld.com:80/ = `index.html' Connecting to www.maqamworld.com:80... connected! HTTP request sent, awaiting response... 503