On 1/29/2014 11:53 AM, Andy Jezierski wrote:
I've been noticing a lot of spam getting through with the same traits,
a bunch of random words within brackets. They all seem to come after
the </body> or the </html> tag. Anyone much more knowledgeable than
me care to assist with a rule to detect them?
Thanks
Andy
Example:
</html>
</body>
<style>
<geehrter>
<convaincre>
<eingerichtet>
<piuttosto>
<meny>
<Aufl>
<quilting>
<surveymonkey>
<update>
<Benoit>
<problemi>
<ese>
<telstra>
<checking>
<aglow>
<insegna>
<doorgeven>
I've been seeing that as well. They seem to all begin with <style> as
well, to keep that crap from going through mail client HTML parsers.
You can probably exploit the fact that nobody is ever going to write a
style block that doesn't match /[{}]/, but I haven't been able to
experiment yet with any rules.
I wouldn't recommend going the more general route of counting invalid
HTML tags, simply due to the enormity of trying to maintain such a rule
over time.