On 01/01/11 11:51, Warren Togami Jr. wrote:
I'll help you start the process with a Bugzilla ticket. I also hope
you could get it into some sort of public source control mechanism
soon so we can see the changes that go into it before inclusion in
upstream. I feel uncomfortable using something that is only available
from a URL without being able to see its change history.
Know how to use git? github.com <http://github.com> is pretty good for
something small like this.
Sure. No problem.
More questions:
1) Is it really necessary to follow a chain deeper than 2? My mind
thinks that a chain of 2 is never legitimate, and it consumes time and
resources to query further.
No - I'll make this configurable and default it to 2. After a bit of
testing to make sure there are no obvious issues doing this.
2) How widespread is URL shortening abuse now? I can figure this out
very easily by adding a non-network URI rule to the nightly
masscheck. Could you please send me privately your updated list of
shorteners so that I may write such a rule?
Based on the reports I get - quite prevalent at times and when these are
used it's effectively a free-pass through the URIBL plug-in which often
results in a false-negative.
As soon as I've sorted out the list - I'll send it to you.
3) SHORT_URL_NOTSHORT
If the expanded address is not much longer than the original address,
then they are likely obfuscating with ill intent. What should the
threshold be? Exact length? Original length + 4 characters? This
should be a good new rule.
Hmmm; not sure on that - I'll see if I can add this.
4)
url_shortener_log /tmp/DecodeShortURLs.txt
url_shortener_cache /tmp/DecodeShortURLs.sq3
Is there a variable to the spamassassin homedir's path so you don't
need to hardcode an absolute path in the default config?
Not that I know of.
5) Do you currently make any distinction between reputable and
non-reputable shortening services? Questions below are related to this.
No - I have no information on this. However the URIBLs do and often
blacklist the rogues if they are used for a lot of abuse.
6) If your plugin expands http://example.com/foobar to
http://somethingsafe.com, is http://example.com hidden from URIBL
lookups? This might matter if a shortening service goes rogue.
No - the expanded URIs are added to the list gathered by SA; it doesn't
overwrite them.
7) How fast are typical URL shortening responses? What is the
timeout? We want to avoid degrading the scan time and delivery
performance of spamassassin, but in a way that cannot be abused by the
spammer to evade detection.
This could be a problem with your huge list of shortening services.
If you blindly include all possible shortening services, spammers
could purposefully use only the slowest in order timeout spamassasin.
Web browsers are more forgiving in timeouts, so a slow redirector is
the ideal way to evade your plugin.
It is possible that you may want to include only the most reputable
shortening services by default, because you don't know what will
happen during the multiple years of your plugin being deployed on
arbitrary servers. Other less reputable shortening services might be
hijacked, domain ownership changed, or simply neglected and become
slow. Such services may need to be blacklisted entirely. For the
non-default shortening services, it may be safe only if it can be
updated via sa-update.
The timeout is set to 5 seconds and with a default of 10 short URIs
scanned it would take 50 seconds before it timed out the lookups.
Thinking about it I could possibly mitigate this by tracking timeouts by
shortener domain; so if the 1st lookup to that shortener service
timed-out then it wouldn't attempt the rest.
8) What UserAgent is used in the HTTP request? If they can easily
detect that the request is not a real browser, then they can avoid
detection by using a safe looking fake response, while browser-based
redirects go to the intended spam target.
Currently the default used by the LWP module. Could easily set it to
use an identical string to Firefox or IE.
Regards,
Steve.