On 16/01/2002 19:31:26 "Ian Abbott" wrote:
>I came across this extract from a table on a website: > ><td ALIGN=CENTER VALIGN=CENTER WIDTH="120" HEIGHT="120"><a >href="66B27885.htm" "msover1('Pic1','thumbnails/MO66B27885.jpg');" >onMouseOut="msout1('Pic1','thumbnails/66B27885.jpg');"><img >SRC="thumbnails/66B27885.jpg" NAME="Pic1" BORDER=0 ></a></td> > >Note the string beginning "msover1(", which seems to be an >attribute value without a name, so that makes it illegal HTML. > That sounds like they wanted onMouseOver="msover1(...)" It's also likely that msover1 is a Javascript function :-( >I haven't traced what Wget is actually doing when it encounters >this, but it doesn't treat "66B27885.htm" as a URL to be >downloaded. > in map_html_tags() /* Establish bounds of attribute name. */ attr_name_begin = p; /* <foo bar ...> */ /* ^ */ while (NAME_CHAR_P (*p)) ADVANCE (p); attr_name_end = p; /* <foo bar ...> */ /* ^ */ if (attr_name_begin == attr_name_end) goto backout_tag; When it sees "msover1(..." it doesn't ADVANCE (because NAME_CHAR_P(") is false). Hence attr_name_begin == attr_name_end, and it backs out: backout_tag: #ifdef STANDALONE ++tag_backout_count; #endif /* The tag wasn't really a tag. Treat its contents as ordinary data characters. */ >I can't call this a bug, but is Wget doing the right thing by >ignoring the href altogether? > Until there's an ESP package that can guess what the author intended, I doubt wget has any choice but to ignore the defective tag. In addition, wget should send an email to webmaster@<offending domain>, complaining about the invalid HTML :-) -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED] http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933