Re: wget problem

Tony Lewis Thu, 03 Jul 2003 08:16:24 -0700

Rajesh wrote:

> Wget is not mirroring the web site properly. For eg it is not copying
symbolic
> links from the main web server.The target directories do exist on the
mirror
> server.


wget can only mirror what can be seen from the web. Symbolic links will be
treated as hard references (assuming that some web page points to them).

If you cannot get there from http://www.sl.nsw.gov.au/ via your browser,
wget won't get the page.

Also, some servers change their behavior depending on the client. You may
need to use a user agent that looks like a browser to mirror some sites. For
example:

wget --user-agent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

will make it look like wget is really Internet Explorer running on Windows
XP.

> Another problem is some of the files are different on the mirror web
server.
> her you again. For eg: compare these 2 attached files.....
>
> penrith1.cfm is the file after wget copied from the main server.
> penrith1.cfm.org is the actual file sitting on the main server.

wget is storing what the web server returned, which may or may not be the
precise file stored on your system.

In particular, I notice that penrith1.cfm contains "<!--Requested: 17:30:40
Thursday 3 July 2003 -->". That implies that all or part of the output is
generated programmatically.

You might try using wget to replicate an FTP version of the website.

Then again, perhaps wget is the wrong tool for your task. Have you
considered using secure copy (scp) instead?

HTH,

Tony

Re: wget problem

Reply via email to