1. web sites sometimes use .php or other crap (asp) instead of htm or html. they should all be hurt with a cluestick, but for the moment wget -nc doesn't cooperate with them. maybe rely on content-type instead. 2. when wget -r -H and hitting https://www.cybersitter.com/robots.txt doing so, wget frequently (or always) says 404 and just hangs. tried kill -CONT (no effect) and kill -ALRM (killed it) Maybe cache 404 on robots.txt anyway. this is GNU Wget 1.9.1.
best regards