Re: Possible reasons for "lost connection after DATA"

2014-09-12 Thread Viktor Dukhovni
On Fri, Sep 12, 2014 at 10:36:51AM +0200, Sean Durkin wrote: > If it's a middle box somewhere along the way, that's even worse. > Even more different people potentially involved... I would rent a backup MX server (deploy identical anti-spam policies, and lists of valid recipients, ...) at a diffe

Re: Possible reasons for "lost connection after DATA"

2014-09-12 Thread Sean Durkin
Hi Mark, Am 11.09.2014 um 22:59 schrieb L. Mark Stone: > Any chance there is a UTM device in the email stream? Possible, but I wouldn't know. This is a rented rootserver in some data center. I don't know their topology, and they probably wouldn't tell me even if I asked. > We see lots of these e

Re: Possible reasons for "lost connection after DATA"

2014-09-12 Thread Sean Durkin
Hi Hannes, Am 11.09.2014 um 20:48 schrieb Hannes Erven: > I remember a possibly similar situation back in 2008... the culprit was a > not-fully-up-to-date Cisco ASA firewall that corrupted TCP SACK fields and > hence had the remote site send RSET. > Anyways on our end the connection seemed to st

Re: Possible reasons for "lost connection after DATA"

2014-09-12 Thread Sean Durkin
Hi Wietse, Am 11.09.2014 um 17:10 schrieb Wietse Venema: > That increases my suspicion of a data-dependent error - some marginal > cable/switch/router, perhaps some middle box with a memory bit error > that requires a power cycle to clear the problem. If the problem is > caused by crosstalk defec

Re: Possible reasons for "lost connection after DATA"

2014-09-12 Thread Sean Durkin
Hi Viktor, Am 11.09.2014 um 16:04 schrieb Viktor Dukhovni: > Your PCAP files should demonstrate repeated retransmission of data, > are the ACKs you're sending confirming receipt of packets that are > sent repeatedly? In that case your ACKs are getting lost? Is > there a sequence number gap in t

Re: Possible reasons for "lost connection after DATA"

2014-09-11 Thread L. Mark Stone
Any chance there is a UTM device in the email stream? We see lots of these errors when our SonicWalls do an RBL lookup, don't like the data in the email stream etc. The SonicWalls then just drop the connection and Postfix logs the drop. Hope that helps, Mark

Re: Possible reasons for "lost connection after DATA"

2014-09-11 Thread Wietse Venema
Sean Durkin: > Meanwhile, I've managed to record a tcpdump of such a failed > session. What exactly am I looking for there? - The receiving host's window announcement in the tcp handshake and in subsequent ACKs. - Whether there is a "gap" in the sender packet sequence numbers as seen by the r

Re: Possible reasons for "lost connection after DATA"

2014-09-11 Thread Hannes Erven
Hi Sean, > Meanwhile, I've managed to record a tcpdump of such a failed session. > What exactly am I looking for there? I remember a possibly similar situation back in 2008... the culprit was a not-fully-up-to-date Cisco ASA firewall that corrupted TCP SACK fields and hence had the remote sit

Re: Possible reasons for "lost connection after DATA"

2014-09-11 Thread Wietse Venema
Sean Durkin: > Hi Wietse, > > Am 11.09.2014 um 13:49 schrieb Wietse Venema: > > What is the distribution of DATA sizes before failure? In your > > example I see numbers around 3kB, 9kB, 12kB. > > At the moment, I see these sizes: > > - always exactly 17511 bytes from smtp-out-127-*.amazon.com (t

Re: Possible reasons for "lost connection after DATA"

2014-09-11 Thread Viktor Dukhovni
On Thu, Sep 11, 2014 at 03:25:57PM +0200, Sean Durkin wrote: > I can contact support, but they of course charge you for > everything they do, and as long as I haven't ruled out that the > reason is just some stupid configuration mistake on my part (or a > routing/filtering issue at my hosting prov

Re: Possible reasons for "lost connection after DATA"

2014-09-11 Thread Sean Durkin
Hi Wietse, Am 11.09.2014 um 13:49 schrieb Wietse Venema: > What is the distribution of DATA sizes before failure? In your > example I see numbers around 3kB, 9kB, 12kB. At the moment, I see these sizes: - always exactly 17511 bytes from smtp-out-127-*.amazon.com (today, seems to be only 3 diffe

Re: Possible reasons for "lost connection after DATA"

2014-09-11 Thread Viktor Dukhovni
On Thu, Sep 11, 2014 at 02:36:51PM +0200, Sean Durkin wrote: > > PLEASE DISABLE > > ALL VERBOSE logging. NO "-v" options in master.cf, NO debug_peer_list, > > Yes, sorry, I cranked up the debug level, since normal logging looks like > this: > > Sep 11 09:43:31 mail postfix/smtpd[25170]: connect

Re: Possible reasons for "lost connection after DATA"

2014-09-11 Thread Sean Durkin
Hi Viktor, Am 10.09.2014 um 23:03 schrieb Viktor Dukhovni: > This trace has an insane level of debugging turned on, to the point > that syslogd is overwhelmed and is losing messages. PLEASE DISABLE > ALL VERBOSE logging. NO "-v" options in master.cf, NO debug_peer_list, > ... Yes, sorry, I crank

Re: Possible reasons for "lost connection after DATA"

2014-09-11 Thread Wietse Venema
Sean Durkin: > Hello Wietse, > > Am 10.09.2014 um 21:52 schrieb Wietse Venema: > > > Slow performance is typical for TCP window scaling problems. Have > > you tried to turn it off in your kernel? > > Yes, Viktor suggested that also and I tried it. It does not make > a difference, the problem per

Re: Possible reasons for "lost connection after DATA"

2014-09-11 Thread Sean Durkin
Hello Wietse, Am 10.09.2014 um 21:52 schrieb Wietse Venema: > Slow performance is typical for TCP window scaling problems. Have > you tried to turn it off in your kernel? Yes, Viktor suggested that also and I tried it. It does not make a difference, the problem persists. Regards, Sean

Re: Possible reasons for "lost connection after DATA"

2014-09-10 Thread Viktor Dukhovni
On Wed, Sep 10, 2014 at 09:19:58PM +0200, Sean Durkin wrote: > > For at least one such session, post all related messages from the > > "postfix/smtpd[pid]" that occur between "connect from" and > > "disconnect from". > Here's one: http://pastebin.com/twb3Z8Eg This trace has an insane level of d

Re: Possible reasons for "lost connection after DATA"

2014-09-10 Thread Wietse Venema
Sean Durkin: [ Charset windows-1252 converted... ] > Hi Viktor, > > Am 10.09.2014 um 16:19 schrieb Viktor Dukhovni: > > Have you tried disabling TCP window scaling? It might be confusing > > some middle-box (firewall, NAT device, ...) on path between the > > remote systems and your MTA. > I wou

Re: Possible reasons for "lost connection after DATA"

2014-09-10 Thread Sean Durkin
Hi Viktor, Am 10.09.2014 um 16:19 schrieb Viktor Dukhovni: > Have you tried disabling TCP window scaling? It might be confusing > some middle-box (firewall, NAT device, ...) on path between the > remote systems and your MTA. I would not have thought of that... I've tried that now, but it does n

Re: Possible reasons for "lost connection after DATA"

2014-09-10 Thread Sean Durkin
Hi Robert, Am 10.09.2014 um 10:11 schrieb Robert Schetterer: > Am 10.09.2014 um 09:56 schrieb Sean Durkin: >> The first question is: >> Can I rule out it's my fault? > > have you changed anything last days/month upgrades/updates software > hardware ? Hardware is unchanged. The Ubuntu postfix pa

Re: Possible reasons for "lost connection after DATA"

2014-09-10 Thread Viktor Dukhovni
On Wed, Sep 10, 2014 at 09:56:48AM +0200, Sean Durkin wrote: > Some of my users were complaining about losing incoming mail, > namely Amazon shipping notifications, newsletters and such things > that they were absolutely sure were sent out, but never reached > their inbox. After doing some digging

Re: Possible reasons for "lost connection after DATA"

2014-09-10 Thread Robert Schetterer
Am 10.09.2014 um 09:56 schrieb Sean Durkin: > The first question is: > Can I rule out it's my fault? have you changed anything last days/month upgrades/updates software hardware ? please send you postfix config , search list archive "lost connection after DATA" Best Regards MfG Robert Schetter

Possible reasons for "lost connection after DATA"

2014-09-10 Thread Sean Durkin
Hello, some of my users were complaining about losing incoming mail, namely Amazon shipping notifications, newsletters and such things that they were absolutely sure were sent out, but never reached their inbox. After doing some digging, increasing log verbosity and such, I found a lot of this: