On 14 Dec 2015, at 21:42, Alex wrote:
[...]

I also don't think it's a DNS problem here, as it doesn't happen on
every message. There are also no other indications of problems with
DNS.

SPF records tend to push the limits of normal DNS, especially in record size, and can bring out edge case flaws in resolvers and firewalls that massage DNS which are otherwise hard to see.

Many times the domain actually has something wrong with SPF, but other
times openspf.org/why and kittermans say there's nothing wrong with
the domain.

Try https://dmarcian.com/spf-survey/ as well. It's very good at showing the details of SPF problems.

What you MAY be seeing are entirely transient problems. Complex SPF setups often come with redirects and includes that cross administrative boundaries and a variety of short TTLs. That combination can result in an SPF resolution that is "broken" only as a result of cached records that are expired a few minutes later and replaced by fresh corrected records.

Another source of persistent SPF errors is demonstrated in the cintas.com record. They've outsourced various aspects of their email ops to 3 different providers and put 4 includes of provider SPF records in their base record, one of which then re-includes one of the top-level includes. The result is the need to do 15 resolutions to expand that record: so it is out of spec and SHOULD return a PERMERROR. With that degree of outsourcing I doubt that there's anyone at Cintas who can even understand what a SPF record is, so the problem is almost certain to be persistent. That sort of problem is widespread in large organizations. If the last person who understood SPF makes the mistake of including outlook.com instead of (or in addition to) spf.protection.outlook.com on their way out the door, they effectively sabotage the ability to add another include for other purposes.

Other domains that fail, such as gmail.com and wellsfargo.com, report
softfail on kitterman when testing due to a redirect, apparently.
Would that be enough to cause this rule to legitimately report
temperror?

No. Softfail is a specifically defined non-error SPF result, caused by a "~all" default ("tail") component of the record. It is NOT an error, it is a chosen result.

Is there any way to easily retest the mails that have failed? I
suppose it's more difficult to test for T_SPF_HELO_TEMPERROR, but
every one of those seem to be legitimate failures anyway.

If you don't have a time machine and don't capture all of your DNS traffic, it is not possible to retest perfectly or know why either flavor of error occurred (although TEMPERROR is almost always the result of DNS timeouts.) However, there is the somewhat lame 'spfquery' script that is includes in the perl Mail::SPF module (which is what does SPF checking for SA) so you can always check locally rather than relying on a test on a distant website using different DNS servers.

Reply via email to