Possible reasons for "lost connection after DATA"

2014-09-10 Thread Sean Durkin
Hello, some of my users were complaining about losing incoming mail, namely Amazon shipping notifications, newsletters and such things that they were absolutely sure were sent out, but never reached their inbox. After doing some digging, increasing log verbosity and such, I found a lot of this:

Re: Possible reasons for "lost connection after DATA"

2014-09-10 Thread Sean Durkin
Hi Robert, Am 10.09.2014 um 10:11 schrieb Robert Schetterer: > Am 10.09.2014 um 09:56 schrieb Sean Durkin: >> The first question is: >> Can I rule out it's my fault? > > have you changed anything last days/month upgrades/updates software > hardware ? Hardware is

Re: Possible reasons for "lost connection after DATA"

2014-09-10 Thread Sean Durkin
Hi Viktor, Am 10.09.2014 um 16:19 schrieb Viktor Dukhovni: > Have you tried disabling TCP window scaling? It might be confusing > some middle-box (firewall, NAT device, ...) on path between the > remote systems and your MTA. I would not have thought of that... I've tried that now, but it does n

Re: Possible reasons for "lost connection after DATA"

2014-09-11 Thread Sean Durkin
Hello Wietse, Am 10.09.2014 um 21:52 schrieb Wietse Venema: > Slow performance is typical for TCP window scaling problems. Have > you tried to turn it off in your kernel? Yes, Viktor suggested that also and I tried it. It does not make a difference, the problem persists. Regards, Sean

Re: Possible reasons for "lost connection after DATA"

2014-09-11 Thread Sean Durkin
Hi Viktor, Am 10.09.2014 um 23:03 schrieb Viktor Dukhovni: > This trace has an insane level of debugging turned on, to the point > that syslogd is overwhelmed and is losing messages. PLEASE DISABLE > ALL VERBOSE logging. NO "-v" options in master.cf, NO debug_peer_list, > ... Yes, sorry, I crank

Re: Possible reasons for "lost connection after DATA"

2014-09-11 Thread Sean Durkin
Hi Wietse, Am 11.09.2014 um 13:49 schrieb Wietse Venema: > What is the distribution of DATA sizes before failure? In your > example I see numbers around 3kB, 9kB, 12kB. At the moment, I see these sizes: - always exactly 17511 bytes from smtp-out-127-*.amazon.com (today, seems to be only 3 diffe

Re: Possible reasons for "lost connection after DATA"

2014-09-12 Thread Sean Durkin
Hi Viktor, Am 11.09.2014 um 16:04 schrieb Viktor Dukhovni: > Your PCAP files should demonstrate repeated retransmission of data, > are the ACKs you're sending confirming receipt of packets that are > sent repeatedly? In that case your ACKs are getting lost? Is > there a sequence number gap in t

Re: Possible reasons for "lost connection after DATA"

2014-09-12 Thread Sean Durkin
Hi Wietse, Am 11.09.2014 um 17:10 schrieb Wietse Venema: > That increases my suspicion of a data-dependent error - some marginal > cable/switch/router, perhaps some middle box with a memory bit error > that requires a power cycle to clear the problem. If the problem is > caused by crosstalk defec

Re: Possible reasons for "lost connection after DATA"

2014-09-12 Thread Sean Durkin
Hi Hannes, Am 11.09.2014 um 20:48 schrieb Hannes Erven: > I remember a possibly similar situation back in 2008... the culprit was a > not-fully-up-to-date Cisco ASA firewall that corrupted TCP SACK fields and > hence had the remote site send RSET. > Anyways on our end the connection seemed to st

Re: Possible reasons for "lost connection after DATA"

2014-09-12 Thread Sean Durkin
Hi Mark, Am 11.09.2014 um 22:59 schrieb L. Mark Stone: > Any chance there is a UTM device in the email stream? Possible, but I wouldn't know. This is a rented rootserver in some data center. I don't know their topology, and they probably wouldn't tell me even if I asked. > We see lots of these e