On 16/01/2002 19:31:26 "Ian Abbott" wrote:

>I came across this extract from a table on a website:
>
><td ALIGN=CENTER VALIGN=CENTER WIDTH="120" HEIGHT="120"><a
>href="66B27885.htm" "msover1('Pic1','thumbnails/MO66B27885.jpg');"
>onMouseOut="msout1('Pic1','thumbnails/66B27885.jpg');"><img
>SRC="thumbnails/66B27885.jpg" NAME="Pic1" BORDER=0 ></a></td>
>
>Note the string beginning "msover1(", which seems to be an
>attribute value without a name, so that makes it illegal HTML.
>

That sounds like they wanted onMouseOver="msover1(...)"
It's also likely that msover1 is a Javascript function :-(

>I haven't traced what Wget is actually doing when it encounters
>this, but it doesn't treat "66B27885.htm" as a URL to be
>downloaded.
>

in map_html_tags()
     /* Establish bounds of attribute name. */
     attr_name_begin = p;     /* <foo bar ...> */
                    /*      ^        */
     while (NAME_CHAR_P (*p))
       ADVANCE (p);
     attr_name_end = p;  /* <foo bar ...> */
                    /*         ^     */
     if (attr_name_begin == attr_name_end)
       goto backout_tag;

When it sees "msover1(..." it doesn't ADVANCE
(because NAME_CHAR_P(") is false).
Hence attr_name_begin == attr_name_end, and it backs out:

  backout_tag:
#ifdef STANDALONE
    ++tag_backout_count;
#endif
    /* The tag wasn't really a tag.  Treat its contents as ordinary
       data characters. */


>I can't call this a bug, but is Wget doing the right thing by
>ignoring the href altogether?
>

Until there's an ESP package that can guess what the author intended,
I doubt wget has any choice but to ignore the defective tag. In addition,
wget should send an email to webmaster@<offending domain>,
complaining about the invalid HTML :-)


--
Csaba Ráduly, Software Engineer                           Sophos Anti-Virus
email: [EMAIL PROTECTED]                        http://www.sophos.com
US Support: +1 888 SOPHOS 9                     UK Support: +44 1235 559933

Reply via email to