Reformatted excerpts from Bart Schaefer's message of 2009-04-29:
> We found that in most cases this failed at (1) or succeeded very
> quickly at (6a).  Only obscure cases proceed to (7), but if you're
> dealing with anything like old USENET news archives or folders written
> by '80s-era mail clients you need either step (4) or step (7) to get
> past the cruft.
> 
> Note that the key is finding "From ... DATE" rather than "From ADDRESS
> ..." if you really want to distinguish message separators from stuff
> people type in a message body.  I'm not sure you can do this with a
> regular expression.

Thanks! This is really helpful. I am a little worried about the current
fix, since there's no real requirement that an email address have an @
sign in it for local users, and that will result in false negatives,
and there's a non-trivial potential for false positives.

If we went this route (which wouldn't require a big changeset), I may
punt on parsing the date myself and just rely on Time.parse. Speed
shouldn't really be affected (except in weird pathological cases) since
the date parsing will be a second step. I like it.
-- 
William <[email protected]>
_______________________________________________
sup-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/sup-talk

Reply via email to