-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Mark Fisher wrote:
> I'm trying to use wget to get the static content of http://www.mkdj7.org.uk
> This site is a bit of a mess and will only render properly in IE [not my
> doing - honest].  In firefox it looks like the entire site is just a few
> images.  It appears that wget also thinks the site is just a few images.
> The site .htm files are authored in MSO and appear to contain a lot of VML.
> Can anyone help me identify the correct command line switches to use to get
> a copy of this site?  All help appreciated.

Wow. Yeah, that's a crappy setup.

By "just a few images", I gather that you mean that the sidebar links
are treated as just images, without links. Wget does not see the links,
because they are embedded within HTML comments(!), which IE and other
MS-only products apparently interpret as a conditional of some sort.

I don't think there's really much you can do here; Wget (and Firefox)
are interpreting the HTML properly; links within comments aren't links,
and shouldn't be interpreted. AFAIK Wget doesn't have a way to parse
links out of comments, so I don't really know what to suggest, other
than to cull the links out of there, and feed them to Wget manually
(say, with -i). :\

Not only will non-IE browsers have trouble with this site, but I'll bet
even slightly older MSIE will not handle this site. Shame on Microsoft
for making Office generate such non-standard "HTML" that isn't
parse-able by anyone but themselves.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHQeHu7M8hyUobTrERCBrwAJ9m8CxSzIuXkzDAhcIdtJhAqO1gKQCgipZj
OytIcOGk1cHgr1V0oCJDMvo=
=ry8e
-----END PGP SIGNATURE-----

Reply via email to