> The list of valid HTML tags is finite. You could try something like:
>
> rawbody BOGUS_HTML_TAG /<\/(?<!(list|of|valid|tags|...)>)[a-z]+>/i
>
> N.B.: I'm still having trouble wrapping my brain around zero-length
> assertions - does the above look right?
I think the above would do it for HTML tags. However, someone else pointed
out that there are a whole lot of other SGML-formatted things that can
appear in mail, and the list of tags for such things is essentially
infinite. Thus, the above test might not be manageable. Which is why I was
trying to match an end tag to a missing begin tag. In my rather limited
understanding of things SGML-like, I don't think it is valid to have an end
tag without a corresponding (proplerly nested) begin tag. Assuming that
basic assumption was correct, a check for any begin tag to match a given end
tag (without bothering with nesting and true balance checks) would be a big
step forward.
Loren