Reformatted excerpts from Bart Schaefer's message of 2009-04-29: > We found that in most cases this failed at (1) or succeeded very > quickly at (6a). Only obscure cases proceed to (7), but if you're > dealing with anything like old USENET news archives or folders written > by '80s-era mail clients you need either step (4) or step (7) to get > past the cruft. > > Note that the key is finding "From ... DATE" rather than "From ADDRESS > ..." if you really want to distinguish message separators from stuff > people type in a message body. I'm not sure you can do this with a > regular expression.
Thanks! This is really helpful. I am a little worried about the current fix, since there's no real requirement that an email address have an @ sign in it for local users, and that will result in false negatives, and there's a non-trivial potential for false positives. If we went this route (which wouldn't require a big changeset), I may punt on parsing the date myself and just rely on Time.parse. Speed shouldn't really be affected (except in weird pathological cases) since the date parsing will be a second step. I like it. -- William <[email protected]> _______________________________________________ sup-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/sup-talk
