Re: Recursive

2004-02-04 Thread Hrvoje Niksic
I don't see any obvious reason why timestamping would work in one
case, but not in the other.  One possible explanation might be that
the second server does not provide correct time-stamping data.  Debug
output (with the `-d' switch) might shed some light on this.



Re: Strange behavior

2004-02-04 Thread Hrvoje Niksic
"chatiman" <[EMAIL PROTECTED]> writes:

> I'm trying to download a "directory" recursively with 
> wget -rL http://site/dir/script.php
> wget retrieves all the pages looking like
> http://site/dir/script.php?param1=value1
>
> but not the following :
> http://site/dir/script.php?param1=value1&page=pageno
>
> What's wrong ?

We'll need more data to get to the bottom of this.  Can you send the
debug output, or at least describe in more detail what the HTML looks
like and what Wget appears to be doing?  BTW why are you using `-L' in
the first place?  `-np' is almost always a better choice.



Re: skip robots

2004-02-04 Thread Hrvoje Niksic
"Jens Rösner" <[EMAIL PROTECTED]> writes:

> PS: One note to the manual editor(s?): The -e switch could be
> (briefly?) mentioned also at the "wgetrc commands" paragraph. I
> think it would make sense to mention it there again without
> clustering the manual too much. Currently it is only mentioned in
> "Basic Startup Options" (and in an example dealing with robots).
> Opinions?

Sure, why not.  Have you just volunteered to write the patch?  :-)



Recursive

2004-02-04 Thread Jeffrey Rosenfeld
I am using wget to retrieve large amounts of genetic sequence from  
various sites.  In order to make this more efficient, I am using time  
stamping, but for some reason the time-stamping does not seem to work  
for one of my queries.  This query :

wget --non-verbose --timestamping  --no-directories -r  
ftp://ftp.sanger.ac.uk/pub/pathogens/ -A dbs -R  
shotgun.dbs,"*EST*","[0 
-9]*","*phage*","*clipped*","*CDS*","Ba1*","EhV*","*Sanger*","*CPS*","pQ 
BR103.dbs"  -X  
/pub/pathogens/malaria2/,/pub/pathogens/ncbi/,/pub/pathogens/sdb/

works perfectly and the time-stamping has it only download the new  
files.  For some reason, this query:

wget --non-verbose --timestamping --no-directories -r  
ftp://ftp.jgi-psf.org/pub/JGI_data/Microbial/ -A contigs,jazz.fasta -R  
"[A- 
Z]*","000317*","microbe2*","microbe1*","rhodo_*","prefin*","microbe4_fas 
ta.screen.noplasmid.contigs","synechococcus.assem.contigs","2351364*","2 
661913*"

does not have time-stamping working correctly and all of the files are  
downloaded again each time the query is run.  I have poured over the  
commands numerous times and I cannot figure out why the time-stamping  
works correctly for one and not for the other.  It is probably some  
simple mistake in my syntax that I cannot see.

Please cc me with your reply as I am not subscribed to the list.

Thanks,
Jeff
Jeffrey Rosenfeld
Department of Invertebrate Zoology
American Museum of Natural History
79th Street@ Central Park West
New York, NY 10024
(212)313-7646


Re: skip robots

2004-02-04 Thread Jens Rösner
use 
robots = on/off in your wgetrc
or 
wget -e robots = on/off URL in your command line

Jens

PS: One note to the manual editor(s?): 
The -e switch could be (briefly?) mentioned 
also at the "wgetrc commands" paragraph. 
I think it would make sense to mention it there again 
without clustering the manual too much. 
Currently it is only mentioned in "Basic Startup Options"
(and in an example dealing with robots).
Opinions?



> I onced used the "skip robots" directive in the wgetrc file.
> But I can't find it anymore in wget 1.9.1 documentation.
> Did it disapeared from the doc or from the program ?
> 
> Please answer me, as I'm not subscribed to this list
> 

-- 
GMX ProMail (250 MB Mailbox, 50 FreeSMS, Virenschutz, 2,99 EUR/Monat...)
jetzt 3 Monate GRATIS + 3x DER SPIEGEL +++ http://www.gmx.net/derspiegel +++



Strange behavior

2004-02-04 Thread chatiman
I'm trying to download a "directory" recursively with 
wget -rL http://site/dir/script.php
wget retrieves all the pages looking like
http://site/dir/script.php?param1=value1

but not the following :
http://site/dir/script.php?param1=value1&page=pageno


What's wrong ?

Please reply me directly as I'm not on the list

Thanks




skip robots

2004-02-04 Thread chatiman
I onced used the "skip robots" directive in the wgetrc file.
But I can't find it anymore in wget 1.9.1 documentation.
Did it disapeared from the doc or from the program ?

Please answer me, as I'm not subscribed to this list



Re: bug in connect.c

2004-02-04 Thread Hrvoje Niksic
"francois eric" <[EMAIL PROTECTED]> writes:

> after some test:
> bug is when: ftp, with username and password, with bind address specifyed
> bug is not when: http, ftp without username and password
> looks like memory leaks. so i made some modification before bind:
> src/connect.c:
> --
> ...
>   /* Bind the client side to the requested address. */
>   wget_sockaddr bsa;
> //!
>   memset (&bsa,0,sizeof(bsa));
> /!!
>   wget_sockaddr_set_address (&bsa, ip_default_family, 0, &bind_address);
>   if (bind (sock, &bsa.sa, sockaddr_len ()))
> ..
> --
> after it all downloads become sucesfull.
> i think better do memset in wget_sockaddr_set_address, but it is for your
> choose.

Interesting.  Is it really necessary to zero out sockaddr/sockaddr_in
before using it?  I see that some sources do it, and some don't.  I
was always under the impression that, as long as you fill the relevant
members (sin_family, sin_addr, sin_port), other initialization is not
necessary.  Was I mistaken, or is this something specific to FreeBSD?

Do others have experience with this?