Walter Underwood wrote:
Extracting links using a regular HTML parser works fine, and isn't
that much work. One of the major issues in an HTML parser is
dealing with all the illegal HTML on the web.
It really depends on what you are looking for, and how tolerant of
errors you are. For most of what I do, I use the HTML parser, but I have
also done simple expression matching to pull out links. This tends to
overestimate the links (e.g., pulling out references in comments, etc.),
and often yields fragments that are not really followable, but it is at
least a possibility.
_______________________________________________
Robots mailing list
[EMAIL PROTECTED]
http://www.mccmedia.com/mailman/listinfo/robots