robots.txt

johannes . schwabe Mon, 22 Nov 2004 10:48:30 -0800


1. web sites sometimes use .php or other crap (asp) instead of htm or html.
   they should all be hurt with a cluestick, but for the moment wget -nc
   doesn't cooperate with them. maybe rely on content-type instead.
2. when wget -r -H and hitting https://www.cybersitter.com/robots.txt
   doing so, wget frequently (or always) says 404 and just hangs.
   tried kill -CONT (no effect) and kill -ALRM (killed it)
   Maybe cache 404 on robots.txt anyway. this is GNU Wget 1.9.1.


best regards

wget PHP / hang on https://www.cybersitter.com/robots.txt

Reply via email to