Thank you Daniel for you response.

The openssl version on the webserver running Apache 2.4.12 is 'OpenSSL 
1.0.1e-fips 11 Feb 2013' and the version running on 2.4.18 is 'OpenSSL 1.0.2g  
1 Mar 2016'.

I'm not sure if we had any tcpdumps from the client side but I'll look into 
that. Getting new captures from the customer's end is going to be tricky, 
otherwise, given their reluctance to help out.

Would there otherwise be any obvious changes between 2.4.12 and 2.4.18 that 
would or potentically could have introduced such a scenario as the one I'm 
describing?

Thanks again.


-----Original Message-----
From: Daniel Ferradal 
<dferra...@apache.org<mailto:daniel%20ferradal%20%3cdferra...@apache.org%3e>>
Reply-To: users@httpd.apache.org<mailto:users@httpd.apache.org>
To: <users@httpd.apache.org> 
<users@httpd.apache.org<mailto:%22%3cus...@httpd.apache.org%3e%22%20%3cus...@httpd.apache.org%3e>>
Subject: Re: [users@httpd] Apache random traffic outage for specific customer
Date: Wed, 09 Oct 2019 22:08:30 +0200

CAUTION: This email originated from outside of the organization.
________________________________
Perhaps you can add the openssl version to the puzzle due to those ssl errors 
you caught, did it change with the upgrade? although without looking I would 
really tend to not associate a time out with ssl issues at all.

I'd also try tcpdump on the client side instead of the server.

El mié., 9 oct. 2019 21:33, Franck Fallateuf 
<franck.fallat...@plansource.com<mailto:franck.fallat...@plansource.com>> 
escribió:
Hello everyone,

We upgraded from Apache 2.4.12 to 2.4.18 on a public facing webserver which 
proxies requests to backend servers. Initially when we cut-over to the 
webserver running the newer version (2.4.18), all traffic seemed to flow 
normally.  But a few days onwards, we received a report from one of our 
customers that they were experiencing random outages. The outage would manifest 
itself in a browser page "This site can't be reached", 
"ERR_CONNECTION_TIMED_OUT".  As far as we were aware, this is the only customer 
experiencing this issue and to report of it. After looking through all 
available logs for Apache and otherwise, we could not identify what was causing 
this nor where this was occurring.  So we decided to setup some packet 
capturing (tcpdumps) from both ends between us and this customer. What we 
observed was the following:

Packet captures on border firewall showed the SSL handshake failing during ECDH 
negotiations, after the server hello message was received on the client. The 
return packet was a ‘bad_record_mac’ alert message, alert code 20.

Because of this, we decided to make the following changes:

During trouble shooting the TIME_WAIT value was increased on the firewall to 
allow enough time for a response, this did not resolve the issue. The firewall 
was then configured for TCP by-pass for the IP addresses having the 
communication issues, this did not resolve the issue either. The firewall is a 
Cisco ASA 5545 running v 9.8(3)29.

While comparing the Apache setup we had running 2.4.12 and 2.4.18, we found out 
that we were running the "event" mpm on 2.4.18 vs "worker" mpm on 2.4.12. 
Reading on the differences between both of these mpm types, we immediately 
thought this could have played a part in this because of how sockets are 
handled. We reverted the mpm back to "worker" on the newer Apache version. We 
tested again and this customer still experienced the same random issues.

Additional information:
  - Customer uses one single destination IP address where all of these requests 
are coming from for all of their employees' traffic to access our application.
  - There seems to be a correlation between high peak traffic time for this 
customer and the likely occurrence of these events.  So as stated all traffic 
is coming from one single destination IP address and there could be 200+ users 
on our system at that given time.
- Customer reports less occurrence of this issue outside of their high peak 
traffic times.
  - We've tuned the ListenBacklog to 99999 with no noticeable impact on this 
issue, although we believe it could have played a part in a separate issue not 
within this scope.

Any help would greatly be appreciated as we are out of ideas and this customer 
has not been very friendly in helping us help them with this issue. We've had 
to revert back to running on Apache 2.4.12 which we would like to upgrade from.

Thank you,
Franck

This email may contain confidential or protected material for the sole use of 
the intended recipient(s). Any review, use, distribution or disclosure by 
others is strictly prohibited. If you are not the intended recipient (or 
authorized to receive for the recipient), please contact the sender by reply 
email and delete all copies of this message.

This email may contain confidential or protected material for the sole use of 
the intended recipient(s). Any review, use, distribution or disclosure by 
others is strictly prohibited. If you are not the intended recipient (or 
authorized to receive for the recipient), please contact the sender by reply 
email and delete all copies of this message.

Reply via email to