>... Mouss, >List Mail User a écrit : >> updated.by - check http://www.tld.by/cgi-bin/registry.cgi >> >> You'll see that update.by is a registered domain! Therefore >> "updated.by" is indeed a URI. QED > >the question is: if foo.example-DEMUNGED is listed in uribl/surbl, does >that make it a bad string in mail? > >If it appears as http://somethin.foo.example-DEMUNGED, or even as a >textual www.foo.example-DEMUNGED, we may consider it "risky" > >But if it appears as: > telnet smtp.foo.example-DEMUNGED >or > Dec 26 23:41:53 bobo postfix/smtpd[7560]: connect from >foo.example-DEMUNGED[192.0.2.56] >... > >then checking *BLs is questionable. There are more chances to block >someone reporting a spammy session or asking for help than seeing a >spammer advertize his site via a log line... > >I believe this is the most important issue that uribl encounters: is the >URI used to advertize or is it an example/report/...? if we solve this, >we'll feel very happy. >
There are several parts to the answer, but the first and most important part can be phrased as "barring a special case", yes a spam domain in mail is bad (period). Now, there are more than a few special cases. One immediate case is that no abuse@ email account should be doing content filtering. Another obvious case is that any person or mailing list which discusses spam need to be whitelisted, setup to avoid filtering or some other action take to configure it not to trip spam filters. The case you listed of an "example/report" would/should always come under these situations, but there are still others; If you file a complaint with any party about an abuse situation, you should be prepared to have your own message quoted back to you (this one has to include organizations like ICANN, the internic, ARIN, RIPE, etc.). If you discuss spam or abuse with another person or on a list, again you should be prepared to be answered similarly (this case I have been guilty of forgetting more than once). There are still more possible cases that can be hard to expect - e.g. I recently got an email from a hosting service that I have locally BL'd which was sent addressed to customers (I am *not* one), but which I was copied on (I have spoken by telephone and email to the business' managers and staff on a few occasions) - fortunately they sent it to an account which is only used for certain types of complaints and communication, and which bypasses the BLs at the MTA level (still hits SA). Also, there are some companies/newsletters which may be on quite a few BLs, but are solicited mail at my site, so they *must* be whitelisted (at the MTA, in SA, in DCC, etc.). If you accept requests for help (with abuse issues, or even allowing such things), you should either be using a dedicated account or be prepared to FP on the emails. (Yes, I know not everyone "controls" one or more domains and can not create special purpose accounts trivially.) Even the simplest case of a bare domain name is clearly bad. How can you distinguish (without building/writing a natural language parser) the difference between saying "I got spam from example.com for ..." and "Copy example.com into your browser to see our specials..."? The second format is fairly common in spam. You could try to somehow score a bare name differently, but them what if it is embedded in a scripting language, HTML or obfuscated with character translations (e.g. %45xample%2Eco%4D or similar); This kind of style can still be "dangerous". There are many examples of non-distributed rules (i.e. not part of SA distributions) which conflict with common styles of email writing and quoting (e.g. the SARE chicken-pox rules and large chunks of source code is a common example). Most of the "standard" SA rules are "safe" under normal conditions, but if some automated tool creates text containing a string which happens by be formatted the same as a spam domain, there will be a conflict (e.g. if "updated.by" were spammy - or even if a local rule penalized non-"RFC compliant" TLDs, since ".by" doesn't have a whois server, so any string with ".by", ".my", ".de", ".mx", etc. at its end could cause problems). I don't think you can find any way to tell if something is actually advertising even if you did have a natural language parser. Consider the case where the mail contains an image of a watch, pills or scantily clad young woman, random text (not random words, but "literary" chaff) and a bare domain name. To a human it may be obvious what is happening, but you'd need a very complex recognizer to get a computer to "know" it was advertising; It could be a picture of your cousin with the poem she won a prize for and the domain it is "published" at, sent by a relative (example from mail I've actually recieved) or it could be an advertisement for child pornography; How can you tell (especially when it comes from a DUL host via a cable ISP)? Not an easy case, and not one I expect to be solved in my lifetime. Paul Shupak [EMAIL PROTECTED]