>From the users perspective, sending bugs to [EMAIL PROTECTED] is like a
black hole. This is in contrast to other systems like the Debian bug
tracking system. No, don't move to bugzilla, else we won't be able to
send email.
-Y is gone from the man page, except one tiny mention, and --help
doesn't show its args. Didn't check Info.
One discovers that wget secretly (not documented) throws away the
content of the response if there was an error (404, 503, etc.).
So there needs to be a --save-even-if-error switch.
Downloaded: 735,142 bytes in 24 files
looks great. But if
09:49:46 ERROR 404: WWWOFFLE Host Not Got.
flew off the screen, one will never know.
That's why you should say
Downloaded: 735,142 bytes in 24 files. 3 files not downloaded.
$ man wget
-i file
--input-file=file
The file need not be an HTML document (but no harm if it
is)---it is enough if the URLs are just listed sequentially.
Well even with -i file.html, one still needs --force-html. So "yes
harm if it is". GNU Wget 1.10.2
I notice with server created directory listings, one can't recurse.
$ lynx -dump http://localhost/~jidanni/test|head
Index of /~jidanni/test
Icon [1]Name [2]Last modified [3]Size [4]Description
_
The documents don't say how or why one can/not force wget to send
HTTP 1.1 headers, instead of 1.0. Maybe it is simply not ready yet?
Man page says:
-i file
--input-file=file
Read URLs from file. If - is specified as file, URLs are read from
the standard input. (Use ./- to read from a file literally named
-.)
If this function is used, no URLs need be present on the comm
H> Maybe it should rather vary between 0.5*wait and 1.5*wait?
There you go again making assumptions about what the user wants.
H> I think it'd be a shame to spend more arguments on such a rarely-used
H> feature.
--random-wait[=a,b,c,d...] loaded with lots of downwardly compatible
arguments that are
"--random-wait causes the time between requests to vary between 0 and
2 * wait seconds, where wait was specified using the --wait option, "
So one can no longer specify a minimum wait time! The 2 and at least
the 0 should be user configurable floating numbers.
Nowadays in the documents, it is very hard to tell -Y off means
--no-proxy. You must be phasing out -Y or something. No big deal. OK.
Also reported already I think:
For more information about the use of proxies with Wget,
And then nothing, on the man page. GNU Wget 1.10.2
Wishlist: support the file:/// protocol:
$ wget file:///home/jidanni/2005_safe_communities.html
In Info "3 Recursive Download", mention "-r, --recursive"!
Also don't threaten to remove -L!
What if I have a whole file of headers I want to use:
$ sed 1d /var/cache/wwwoffle/outgoing/O6PxpG00D+DBLAI8puEtOew|col|colrm 22
Host: www.hsr.gov.tw
User-Agent: Mozilla/5
Accept: text/xml,appl
Accept-Language: en-u
Accept-Encoding: gzip
Accept-Charset: Big5,
Keep-Alive: 300
Proxy-Connection: kee
R
Curl has this impressive looking feature:
$ man curl
--max-filesize
Specify the maximum size (in bytes) of a file to download. If the
file requested is larger than this value, the transfer will not
start and curl will return with exit code 63.
NOTE: The file size is not always known prio
-Y not mentioned fully in man and info:
$ wget --help|grep -- -Y
-Y, --proxy explicitly turn on proxy.
$ man wget|col -b|grep -- -Y
if Wget crashes while downloading wget -rl0 -kKE -t5 -Y0
$ wget -V
GNU Wget 1.10.1-beta1 ...
Originally written by Hrvoje Niksic <[EMAI
In the man page
-l depth
--level=depth
Specify recursion maximum depth level depth. The default maximum
depth is 5.
Say what levels 0 and 1 do, so one gets an idea about what depth means
'this page only' and 'just the links in this page, and no further'.
Wget needs a --print-uris or --dry-run option, to show what it would
get/do without actually doing it!
Not only can one check if e.g., -B will do what they want before
actually doing it, one could also use wget as a general URL extractor,
etc.
--debug is not what I'm talking about. I'm more talki
Ok, then here
-B URL
--base=URL
When used in conjunction with -F, prepends URL to relative links in
the file specified by -i.
don't mention -F!
Why must -B need -F to take effect? Why can't one do
xargs wget -B http://bla.com/ -i - <
I see I must do
wget --spider -i file -nv 2>&1|awk '!/^$|^200 OK$/'
as the only way to just get the errors. There is no flag that will
only let the errors thru and silence the rest. -q silences all.
Wget 1.9.1
P> I don't know why you say that. I see bug reports and discussion of fixes
P> flowing through here on a fairly regular basis.
All I know is my reports for the last few months didn't get the usual (any!)
cheery replies. However, I saw them on Gmane, yes.
Is it still useful to mail to [EMAIL PROTECTED] I don't think
anybody's home. Shall the address be closed?
In the man page, show how one does this
wget --cookies=off --header "Cookie: ="
with more than one cookie.
1. Anybody home?
2. No way to make wget not refetch the file when:
Last-modified header missing -- time-stamps turned off.
09:55:20 URL:http://bm2ddp.myweb.hinet.net/b3.htm [16087] ->
"uris.d/bm2ddp.myweb.hinet.net/b3.htm" [1]
when using wget -s -w 2 -e robots=off -P bla.d -p -t 1 -N -nv
Anybody home? This looks weird:
$ wget --spider -S -r -l 1 http://www.noise.com.tw/eia/product.htm
--09:22:13-- http://www.noise.com.tw/eia/product.htm
=> `www.noise.com.tw/eia/product.htm'
Resolving localhost... 127.0.0.1
Connecting to localhost[127.0.0.1]:8080... connected.
Proxy req
There is no way to see what
$ lynx -dump http://wapp8.taipower.com.tw/ can show me
when
$ wget -O - -S -s http://wapp8.taipower.com.tw/
08:54:48 ERROR 403: Access Forbidden.
i.e., the site's error message contents.
>>>>> "D" == Derek B Noonburg <[EMAIL PROTECTED]> writes:
D> On 20 Nov, Dan Jacobson wrote:
D> Can you try the binary on my web site?
D> ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.00-linux.tar.gz)
>>
>> But my batch script to wget it doesn't
Odd,
$ ssh debian.linux.org.tw wget -e robots=off --spider -t 1 -i - < a.2
No URLs found in -.
Or is this wget just too old?
P.S., no cheery responses received recently.
On the man page:
For more information about the use of proxies with Wget,
-Q quota
To Info node "Time-Stamping Usage" add a clarification about what
happens when -N and -p are used together: are e.g., all the included
images also checked, or just the main page?
Mention that -p turns on or implies -x in both the -p and -x parts of
both the man and info pages.
Perhaps a useful option would be to have files use a temporary name
until download is complete, then moving to the permanent name.
BTW, because wget 1.9.1 has no way to save "session cookies" yet, that
example will fail often. Hopefully the user will soon be able to
control what cookies are saved no matter what they themselves say.
$ man wget
When running Wget without -N, -nc, or -r, downloading the same file in the
same
directory will result in the original copy of file being preserved and the
second
copy being named file.1.
$ wget -x http://static.howstuffworks.com/flash/toilet.swf
$ wget
$ man wget
This example shows how to log to a server using POST and then proceed to
download
the desired pages, presumably only accessible to authorized users:
# Log in to the server. This can be done only once.
You mean "we only do this once".
H> I suppose forking would not be too hard, but dealing with output from
H> forked processes might be tricky. Also, people would expect `-r' to
H> "parallelize" as well, which would be harder yet.
OK, maybe add a section to the manual, showing that you have
considered parallel fetching, but the c
Phil> How about
Phil> $ wget URI1 & wget URI2
Mmm, OK, but unwieldy if many. I guess I'm thinking about e.g.,
$ wget --max-parallel-fetches=11 -i url-list
(hmm, with default=1 meaning not parallel, but sequential.)
Man page: When running Wget with -N, with or without -r, the decision as to whether
or not
to download a newer copy of a file depends on the local and remote
timestamp and
size of the file.
I have an application where I want it only to depend on the timestamp.
Too bad ther
Maybe add an option so e.g.,
$ wget --parallel URI1 URI2 ...
would get them at the same time instead of in turn.
Wget should have a --print-uris option, to tell us what it is planning
to get, so we can adjust things without making commitment yet.
Perhaps useful with -i or -r...
The man page doesn't say what will happen if one specifies
--random-wait but no --wait has been used.
Perhaps just say under --wait that --wait=0 if not set.
In Info where you mention:
Most of these commands have command-line equivalents (*note
Invoking::), though some of the more obscure or rarely used ones do not.
You should also mention -e in the same paragraph.
The docs should mention return value... In fact it should be an item
in the Info Concept Index.
I.e. how to depend on
$ wget ... && bla || mla
So say what circumstances wget will return non-zero.
$ info
The time-stamping in GNU Wget is turned on using `--timestamping'
(`-N') option, or through `timestamping = on' directive in `.wgetrc'.
With this option, for each file it intends to download, Wget will check
whether a local file of the same name exists. If it does, and the
remo
H> Do you really need an option to also save expired cookies?
You should allow the user power over all aspects...
Wishlist: giving a way to save the types of cookies you say you won't in:
`--save-cookies FILE'
Save cookies to FILE at the end of session. Cookies whose expiry
time is not specified, or those that have already expired, are not
saved.
so we can carry state between wget invocations,
On the man page the interaction between -O vs. -nc is not mentioned!
Nor perhaps -O vs. -N.
Indeed, why don't you cause an error when you find both -O and -nc
used, if you don't intend to allow -nc work with -O, which would
actually be best.
True, the man page doesn't say --spider will tell me the size of a
file without fetching it, but I already got used to that on http, but
for ftp,
wget --spider -Y off -S ftp://gmt.soest.hawaii.edu/pub/gmt/4/GMT_high.tar.bz2
just give some messages, ending in
227 Entering Passive Mode (128,171,159,1
It seems one cannot just use the wget .exe without the DLLs, even if
one only wants to connect to just http sites, not any https sites.
So one cannot just click on the wget .exe from inside Unzip's filelist.
H> For getting Wget you might want to link directly to
H> ftp://ftp.sunsite.dk/projects/wget/windows/wget-1.9.1b-complete.zip,
OK, but too bad there's no stable second link .../latest.zip so I
don't have to update my web page to follow the link.
Furthermore, they don't need SSL, but I don't see an
Normally, if I want to check out how big a page is before committing
to download it, I use
wget -S --spider URL
You might give this as a tip in the docs.
However, for FTP it doesn't get one the file size. At least for
wget -S --spider ftp://ftp.sunsite.dk/projects/wget/windows/wget-1.9.1b-complet
I suppose Windows users don't have a way to get more that one file at
once, hence to have a Windows user download 500 files and burn them
onto a CD, as in
http://jidanni.org/comp/apt-offline/index_en.html
so one needs wget? Any tips on the concept in my web page? I don't have
Windows to try it. C
> "Hrvoje" == Hrvoje Niksic <[EMAIL PROTECTED]> writes:
Hrvoje> Please send bug reports to [EMAIL PROTECTED], or at least make sure
Hrvoje> that they don't go only to me.
Yes, but needing a confirmation message over and over has driven me nuts.
--spider
...it will not download the pages...
$ wget -Y off --spider ftp://alpha.gnu.org/gnu/coreutils/coreutils-5.0.91.tar.bz2
--12:13:37-- ftp://alpha.gnu.org/gnu/coreutils/coreutils-5.0.91.tar.bz2
=> `coreutils-5.0.91.tar.bz2'
Resolving alpha.gnu.org... done.
Conne
H> It's not very hard to fix `--header' to replace Wget-generated
H> values.
H> Is there consensus that this is a good replacement for
H> `--connect-address'?
I don't want to tamper with headers.
I want to be able to do experiments leaving all variables alone except
for IP address. Thus --connec
>> And stop making me have to confirm each and every mail to this list.
Hrvoje> Currently the only way to avoid confirmations is to subscribe to the
Hrvoje> list. I'll try to contact the list owners to see if the mechanism can
Hrvoje> be improved.
subscribe me with the "nomail" option, if it can
> "P" == Post, Mark K <[EMAIL PROTECTED]> writes:
P> You can do this now:
P> wget http://216.46.192.85/
P> Using DNS is just a convenience after all, not a requirement.
but then one doesn't get the HTTP Host field set to what he wants.
By the way, I did edit /etc/hosts to do one experiment
http://groups.google.com/groups?threadm=vrf7007pbg2136%40corp.supernews.com
i.e. <[EMAIL PROTECTED]>
to test an IP/name combination, without waiting for DNS's to update.
Good thing I was root so I could do it.
I sure hope that when one sees
$ wget --spider BAD_URL GOOD_URL; echo $?
0
$ wget --spider GOOD_URL BAD_URL; echo $?
1
I say they both should be 1.
If anything bad happens, return 1 or some other non-zero value.
By BAD, I mean a producer of e.g.,
ERROR 503: Service Unavailable.
--spider or not, too.
And stop making me have to
I see there is
--bind-address=ADDRESS
When making client TCP/IP connections, "bind()" to ADDRESS on the local
machine.
ADDRESS may be specified as a hostname or IP address. This option can be
useful
if your machine is bound to multiple IPs.
But I want a
Man says:
-T seconds ... The default timeout is 900 seconds
Ok, then why does this take only 3 minutes to give up?:
--07:58:54--
http://linux.csie.nctu.edu.tw/OS/Linux/distributions/debian/dists/sid/main/binary-i386/Packages.gz
=> `Packages.gz'
Resolving linux.csie
-q and -S are incompatible and should perhaps produce errors and be noted thus
in the docs.
BTW, there seems no way to get the -S output, but no progress
indicator. -nv, -q kill them both.
P.S. one shouldn't have to confirm each bug submission. Once should be enough.
The man page says
To prevent the pass- words from being seen, store them in
.wgetrc or .netrc,
The problem is that if you just happen to have a .netrc entry for a
certain machine, but you don't wish wget would notice it, then what to
do?
Can you believe
$ wget --http-user=
> You can view the map at:
> http://home.sara.nl/~bram/debchart.jpeg
< WARNING: this image is ENORMOUS.
OK, so I will use
wget -O --spider -Y off http://home.sara.nl/~bram/debchart.jpeg
to see how big before biting with my modem, I thought. But I mistyped
-O for -S and ended up getting the whole
"--15:33:01--" is not adequate for beyond 24 hours. Wish there was a
way to put more date info into this message, like syslog does, without
stepping outside wget.
I was hoping to separate the usual news,
$ wget http://abc.iis.sinica.edu.tw/
--09:26:00-- http://abc.iis.sinica.edu.tw/
=> `index.html'
Resolving localhost... done.
Connecting to localhost[127.0.0.1]:8080... connected.
from the bad news,
Proxy request sent, awaiting response... 503
The following message is a courtesy copy of an article
that has been posted to gmane.network.wwwoffle.user as well.
As we wwwoffle users all might know, wget has
-c, --continue resume getting a partially-downloaded file.
Quite handy when a large download got interrupted. However
No documentation found on what wget's return codes are. e.g.
reasnonable wish:
$ wget -N URL && echo got it|mail john
please add to docs about what the policy is, even if 'none at present'
--
http://www.geocities.com/jidanni Tel886-4-25854780 ¿n¤¦¥§
I thinking wget could have a status option, like bash's
$ set -o
allexport off
braceexpand on
errexit off...
perhaps a plain
$ wget -d
might be a good place.
--
http://www.geocities.com/jidanni Tel886-4-25854780
70 matches
Mail list logo