On Fri, Dec 31, 2010 at 11:46 PM, Steve Freegard <st...@stevefreegard.com>wrote:
> > I notice that there is no Bugzilla ticket for this plugin. Do you intend >> on submitting it for inclusion in future spamassassin upstream? >> >> > > I hadn't really thought about it TBH and wasn't sure what the procedure was > for this. > > It's been working well for me and for others based on some feedback that > people have sent me - however it could do with being tested in the network > mass-checks to see actually how effective it is compared to the other rules. > > But I'd also feel a bit more comfortable if one of the core devs looked > over the code and made sure I haven't done anything obviously stupid. I'll help you start the process with a Bugzilla ticket. I also hope you could get it into some sort of public source control mechanism soon so we can see the changes that go into it before inclusion in upstream. I feel uncomfortable using something that is only available from a URL without being able to see its change history. Know how to use git? github.com is pretty good for something small like this. > > > Would a DoS happen if the scanned e-mail contains 10,000 short URL's, and >> your mail server is hit by many such mail? (Either spamassasin becomes very >> slow, or you piss off the short URL provider by hitting them too quickly and >> often.) >> >> > No - it's got a hard-coded limit of 10 short URLs that will be checked at > maximum; anything after the limit of 10 are skipped. You can also > optionally enable a cache (requires DBD::SQLite) to prevent multiple > messages with the same short link from generating additional queries. > > On reflection whilst typing this - I could probably handle this a bit > better; currently the short URLs are stored in a Perl hash (to effectively > de-dup them); I should possibly turn the hash into an array, randomize it > and remove the first 10 entries from it so it's not so predictable. Sounds like an good plan. I'll see how it is in practice. > > > Could the plugin detect when there are intentionally too many short URL's? >> If so, what should it do in such cases? Are there ever legit reasons for >> an e-mail to have a large number of short URL's? >> >> > For now - I guess I could add an additional rule (e.g. scored at 0.001 to > see how many times it hits the current limit); but the age old issue is 'how > many is too many?'. > > I'll see about pushing out a new version with the updated list of > shorteners and those changes shortly. > > Kind regards, > Steve. > More questions: 1) Is it really necessary to follow a chain deeper than 2? My mind thinks that a chain of 2 is never legitimate, and it consumes time and resources to query further. 2) How widespread is URL shortening abuse now? I can figure this out very easily by adding a non-network URI rule to the nightly masscheck. Could you please send me privately your updated list of shorteners so that I may write such a rule? 3) SHORT_URL_NOTSHORT If the expanded address is not much longer than the original address, then they are likely obfuscating with ill intent. What should the threshold be? Exact length? Original length + 4 characters? This should be a good new rule. 4) url_shortener_log /tmp/DecodeShortURLs.txt url_shortener_cache /tmp/DecodeShortURLs.sq3 Is there a variable to the spamassassin homedir's path so you don't need to hardcode an absolute path in the default config? 5) Do you currently make any distinction between reputable and non-reputable shortening services? Questions below are related to this. 6) If your plugin expands http://example.com/foobar to http://somethingsafe.com, is http://example.com hidden from URIBL lookups? This might matter if a shortening service goes rogue. 7) How fast are typical URL shortening responses? What is the timeout? We want to avoid degrading the scan time and delivery performance of spamassassin, but in a way that cannot be abused by the spammer to evade detection. This could be a problem with your huge list of shortening services. If you blindly include all possible shortening services, spammers could purposefully use only the slowest in order timeout spamassasin. Web browsers are more forgiving in timeouts, so a slow redirector is the ideal way to evade your plugin. It is possible that you may want to include only the most reputable shortening services by default, because you don't know what will happen during the multiple years of your plugin being deployed on arbitrary servers. Other less reputable shortening services might be hijacked, domain ownership changed, or simply neglected and become slow. Such services may need to be blacklisted entirely. For the non-default shortening services, it may be safe only if it can be updated via sa-update. 8) What UserAgent is used in the HTTP request? If they can easily detect that the request is not a real browser, then they can avoid detection by using a safe looking fake response, while browser-based redirects go to the intended spam target. Warren Togami war...@togami.com