Re: Recursive
I don't see any obvious reason why timestamping would work in one case, but not in the other. One possible explanation might be that the second server does not provide correct time-stamping data. Debug output (with the `-d' switch) might shed some light on this.
Re: Strange behavior
"chatiman" <[EMAIL PROTECTED]> writes: > I'm trying to download a "directory" recursively with > wget -rL http://site/dir/script.php > wget retrieves all the pages looking like > http://site/dir/script.php?param1=value1 > > but not the following : > http://site/dir/script.php?param1=value1&page=pageno > > What's wrong ? We'll need more data to get to the bottom of this. Can you send the debug output, or at least describe in more detail what the HTML looks like and what Wget appears to be doing? BTW why are you using `-L' in the first place? `-np' is almost always a better choice.
Re: skip robots
"Jens Rösner" <[EMAIL PROTECTED]> writes: > PS: One note to the manual editor(s?): The -e switch could be > (briefly?) mentioned also at the "wgetrc commands" paragraph. I > think it would make sense to mention it there again without > clustering the manual too much. Currently it is only mentioned in > "Basic Startup Options" (and in an example dealing with robots). > Opinions? Sure, why not. Have you just volunteered to write the patch? :-)
Recursive
I am using wget to retrieve large amounts of genetic sequence from various sites. In order to make this more efficient, I am using time stamping, but for some reason the time-stamping does not seem to work for one of my queries. This query : wget --non-verbose --timestamping --no-directories -r ftp://ftp.sanger.ac.uk/pub/pathogens/ -A dbs -R shotgun.dbs,"*EST*","[0 -9]*","*phage*","*clipped*","*CDS*","Ba1*","EhV*","*Sanger*","*CPS*","pQ BR103.dbs" -X /pub/pathogens/malaria2/,/pub/pathogens/ncbi/,/pub/pathogens/sdb/ works perfectly and the time-stamping has it only download the new files. For some reason, this query: wget --non-verbose --timestamping --no-directories -r ftp://ftp.jgi-psf.org/pub/JGI_data/Microbial/ -A contigs,jazz.fasta -R "[A- Z]*","000317*","microbe2*","microbe1*","rhodo_*","prefin*","microbe4_fas ta.screen.noplasmid.contigs","synechococcus.assem.contigs","2351364*","2 661913*" does not have time-stamping working correctly and all of the files are downloaded again each time the query is run. I have poured over the commands numerous times and I cannot figure out why the time-stamping works correctly for one and not for the other. It is probably some simple mistake in my syntax that I cannot see. Please cc me with your reply as I am not subscribed to the list. Thanks, Jeff Jeffrey Rosenfeld Department of Invertebrate Zoology American Museum of Natural History 79th Street@ Central Park West New York, NY 10024 (212)313-7646
Re: skip robots
use robots = on/off in your wgetrc or wget -e robots = on/off URL in your command line Jens PS: One note to the manual editor(s?): The -e switch could be (briefly?) mentioned also at the "wgetrc commands" paragraph. I think it would make sense to mention it there again without clustering the manual too much. Currently it is only mentioned in "Basic Startup Options" (and in an example dealing with robots). Opinions? > I onced used the "skip robots" directive in the wgetrc file. > But I can't find it anymore in wget 1.9.1 documentation. > Did it disapeared from the doc or from the program ? > > Please answer me, as I'm not subscribed to this list > -- GMX ProMail (250 MB Mailbox, 50 FreeSMS, Virenschutz, 2,99 EUR/Monat...) jetzt 3 Monate GRATIS + 3x DER SPIEGEL +++ http://www.gmx.net/derspiegel +++
Strange behavior
I'm trying to download a "directory" recursively with wget -rL http://site/dir/script.php wget retrieves all the pages looking like http://site/dir/script.php?param1=value1 but not the following : http://site/dir/script.php?param1=value1&page=pageno What's wrong ? Please reply me directly as I'm not on the list Thanks
skip robots
I onced used the "skip robots" directive in the wgetrc file. But I can't find it anymore in wget 1.9.1 documentation. Did it disapeared from the doc or from the program ? Please answer me, as I'm not subscribed to this list
Re: bug in connect.c
"francois eric" <[EMAIL PROTECTED]> writes: > after some test: > bug is when: ftp, with username and password, with bind address specifyed > bug is not when: http, ftp without username and password > looks like memory leaks. so i made some modification before bind: > src/connect.c: > -- > ... > /* Bind the client side to the requested address. */ > wget_sockaddr bsa; > //! > memset (&bsa,0,sizeof(bsa)); > /!! > wget_sockaddr_set_address (&bsa, ip_default_family, 0, &bind_address); > if (bind (sock, &bsa.sa, sockaddr_len ())) > .. > -- > after it all downloads become sucesfull. > i think better do memset in wget_sockaddr_set_address, but it is for your > choose. Interesting. Is it really necessary to zero out sockaddr/sockaddr_in before using it? I see that some sources do it, and some don't. I was always under the impression that, as long as you fill the relevant members (sin_family, sin_addr, sin_port), other initialization is not necessary. Was I mistaken, or is this something specific to FreeBSD? Do others have experience with this?