Re: New plugin: DecodeShortURLs

Warren Togami Jr. Sat, 01 Jan 2011 03:52:05 -0800

On Fri, Dec 31, 2010 at 11:46 PM, Steve Freegard <st...@stevefreegard.com>wrote:


>
>  I notice that there is no Bugzilla ticket for this plugin.  Do you intend
>> on submitting it for inclusion in future spamassassin upstream?
>>
>>
>
> I hadn't really thought about it TBH and wasn't sure what the procedure was
> for this.
>
> It's been working well for me and for others based on some feedback that
> people have sent me - however it could do with being tested in the network
> mass-checks to see actually how effective it is compared to the other rules.
>
> But I'd also feel a bit more comfortable if one of the core devs looked
> over the code and made sure I haven't done anything obviously stupid.


I'll help you start the process with a Bugzilla ticket.  I also hope you
could get it into some sort of public source control mechanism soon so we
can see the changes that go into it before inclusion in upstream.  I feel
uncomfortable using something that is only available from a URL without
being able to see its change history.

Know how to use git?  github.com is pretty good for something small like
this.


>
>
>  Would a DoS happen if the scanned e-mail contains 10,000 short URL's, and
>> your mail server is hit by many such mail?  (Either spamassasin becomes very
>> slow, or you piss off the short URL provider by hitting them too quickly and
>> often.)
>>
>>
> No - it's got a hard-coded limit of 10 short URLs that will be checked at
> maximum; anything after the limit of 10 are skipped.  You can also
> optionally enable a cache (requires DBD::SQLite) to prevent multiple
> messages with the same short link from generating additional queries.
>
> On reflection whilst typing this - I could probably handle this a bit
> better; currently the short URLs are stored in a Perl hash (to effectively
> de-dup them); I should possibly turn the hash into an array, randomize it
> and remove the first 10 entries from it so it's not so predictable.


Sounds like an good plan.  I'll see how it is in practice.


>
>
>  Could the plugin detect when there are intentionally too many short URL's?
>>  If so, what should it do in such cases?  Are there ever legit reasons for
>> an e-mail to have a large number of short URL's?
>>
>>
> For now - I guess I could add an additional rule (e.g. scored at 0.001 to
> see how many times it hits the current limit); but the age old issue is 'how
> many is too many?'.
>
> I'll see about pushing out a new version with the updated list of
> shorteners and those changes shortly.
>
> Kind regards,
> Steve.
>

More questions:

1) Is it really necessary to follow a chain deeper than 2?  My mind thinks
that a chain of 2 is never legitimate, and it consumes time and resources to
query further.

2) How widespread is URL shortening abuse now?  I can figure this out very
easily by adding a non-network URI rule to the nightly masscheck.  Could you
please send me privately your updated list of shorteners so that I may write
such a rule?

3) SHORT_URL_NOTSHORT
If the expanded address is not much longer than the original address, then
they are likely obfuscating with ill intent.  What should the threshold be?
Exact length?  Original length + 4 characters?  This should be a good new
rule.

4)

url_shortener_log /tmp/DecodeShortURLs.txt
url_shortener_cache /tmp/DecodeShortURLs.sq3

Is there a variable to the spamassassin homedir's path so you don't need to
hardcode an absolute path in the default config?

5) Do you currently make any distinction between reputable and non-reputable
shortening services?  Questions below are related to this.

6) If your plugin expands http://example.com/foobar to
http://somethingsafe.com, is http://example.com hidden from URIBL lookups?
This might matter if a shortening service goes rogue.

7) How fast are typical URL shortening responses?  What is the timeout?  We
want to avoid degrading the scan time and delivery performance of
spamassassin, but in a way that cannot be abused by the spammer to evade
detection.

This could be a problem with your huge list of shortening services.  If you
blindly include all possible shortening services, spammers could
purposefully use only the slowest in order timeout spamassasin.  Web
browsers are more forgiving in timeouts, so a slow redirector is the ideal
way to evade your plugin.

It is possible that you may want to include only the most reputable
shortening services by default, because you don't know what will happen
during the multiple years of your plugin being deployed on arbitrary
servers.  Other less reputable shortening services might be hijacked, domain
ownership changed, or simply neglected and become slow.  Such services may
need to be blacklisted entirely.  For the non-default shortening services,
it may be safe only if it can be updated via sa-update.

8) What UserAgent is used in the HTTP request?  If they can easily detect
that the request is not a real browser, then they can avoid detection by
using a safe looking fake response, while browser-based redirects go to the
intended spam target.

Warren Togami
war...@togami.com

Re: New plugin: DecodeShortURLs

Reply via email to