Hey,

I noticed, that I am experiencing a strange issue with https requests (both on 
http/1.1 and http/2):

It seems like around 1 of 500 / 1 of 1000 requests gets delayed by around 60 to 
90  Seconds between the Client and HAProxy.
All other requests work fine and are blazingly fast.

What the client application logs:
17:59:20.488006 Starting REST request
17:59:40.492774 REST request: PUT 
https://mydomain.com:443/super_fancy_url/_doc/1313409683%3A58505246%3A2023-05-30T15%3A59%3A20Z?refresh=false
 returned 0 and took 19999.01ms(name_lookup_time: 0.09ms, connect_time: 0.09ms)

This error happens, because the client does not receive a http response within 
the 20 seconds configured timeout.

When looking through the HAProxy logs, I find a log line for this request, but 
for whatever reason, the time logged there does not match the request time:

May 30 18:00:40 haproxy[458192]: 10.152.40.11:42054 [30/May/2023:18:00:40.964] 
HTTP_MYDOMAIN~ HTTP_MYDOMAIN/super-server.mydomain.com 0/0/0/9/9 201 494 - - 
---- 32852/116/0/0/0 0/0 "PUT 
/super_fancy_url/_doc/1313409683%3A58505246%3A2023-05-30T15%3A59%3A20Z?refresh=false
  HTTP/1.1"

I double-checked the system clocks on the client and the HAProxy node and can 
ensure, that they are in sync.

As the traffic is SSL encrypted, I don’t think, I can do a useful tcp-dump here.

Interesting is, that the application claims to be able to connect on the 
TCP-Level to HAProxy in less than a millisecond. (That might be, as we have 
sub-millisecond latency in our network), but it seems to really be the 
HTTP-Request that is delayed. But HAProxy reports, that the connection to / 
response from the backend server took only 9ms.
To me, that does not explain an exceeded timeout of >20 seconds.

When looking at examples (same source and destination), where the request was 
not delayed / timing out, then the times logged by HAproxy were correct too.

The hopefully important bits from my config:

global
  # Send logs to syslog
  log 10.12.244.22:1514 local0 notice
  log 10.12.244.22:1514 local1 info

  maxconn 100000
  ulimit-n 655360
  nbthread 64

  external-check
  spread-checks 10
  insecure-fork-wanted

  master-worker

  # Hard stop old workers after some time to prevent 
https://github.com/haproxy/haproxy/issues/1701
  hard-stop-after 576h # 24 days

  stats socket /var/run/haproxy/admin.sock mode 666 expose-fd listeners level 
admin
  stats socket /var/run/haproxy/haproxy.sock expose-fd listeners mode 666
  stats socket /var/run/haproxy.sock expose-fd listeners mode 666 level admin
  stats timeout 30s
  user haproxy
  group haproxy
  daemon

  # Generated by 
https://ssl-config.mozilla.org/#server=haproxy&server-version=1.8.8&config=intermediate
  ssl-default-bind-ciphers 
ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
  ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets

  ssl-default-server-ciphers 
ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
  ssl-default-server-options no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets

  tune.ssl.default-dh-param 2048
  tune.bufsize 524288

defaults
  log global
  option  dontlognull
  option  redispatch
  option  log-health-checks
  timeout connect 1s
  timeout client  600s
  timeout server  600s
  timeout client-fin 5s
  timeout server-fin 5s
  timeout check   250
  errorfile 400 /etc/haproxy/errors/400.http
  errorfile 403 /etc/haproxy/errors/403.http
  errorfile 408 /etc/haproxy/errors/408.http
  errorfile 500 /etc/haproxy/errors/500.http
  errorfile 502 /etc/haproxy/errors/502.http
  errorfile 503 /etc/haproxy/errors/503.http
  errorfile 504 /etc/haproxy/errors/504.http
  default-server maxconn 100000
  option allbackups

listen HTTP_MYDOMAIN
  mode http

  bind 10.200.4.198:80
  bind 10.200.4.198:443 ssl crt /etc/ssl/haproxy/
  option httpchk GET /_cluster/health?local=true
  option  httplog
  option forwardfor header X-Real-Ip

  http-request replace-value Upgrade (.*) websocket # 
https://bishopfox.com/blog/h2c-smuggling-request
  http-request del-header X-Forwarded-For
  http-request set-header X-Forwarded-Port 443 if { ssl_fc }
  http-request set-header X-Forwarded-Port 80 if !{ ssl_fc }
  http-request del-header X-Forwarded-Proto
  http-request set-header X-Forwarded-Proto https if { ssl_fc }
  http-request set-header X-Forwarded-Proto http if !{ ssl_fc }


  timeout check 1500ms
  default-server inter 2s

  http-response set-header LB-FQDN "haproxy-1.mydomain.com"
  http-response set-header LB-Backend-Server %s
  # https://serverfault.com/questions/650588/haproxy-timing-connection-diagram
  http-response set-header LB-Times "connect: %Th ms; queue: %Tw ms; 
be-connect: %Tc ms; be-response: %Tr ms"

  http-request redirect scheme https code 301 unless { ssl_fc }


  server super-server.mydomain.com 10.16.32.134:9200 check
  server super-server-2.mydomain.com 10.16.32.135:9200 check
  server super-server-3.mydomain.com 10.16.32.136:9200 check




and my haproxy version:

HAProxy version 2.6.13-1ppa1~focal 2023/05/06 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2027.
Known bugs: http://www.haproxy.org/bugs/bugs-2.6.13.html
Running on: Linux 5.4.0-94-generic #106-Ubuntu SMP Thu Jan 6 23:58:14 UTC 2022 
x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -O2 
-fdebug-prefix-map=/build/haproxy-miXGSs/haproxy-2.6.13=. 
-fstack-protector-strong -Wformat -Werror=format-security -Wdate-time 
-D_FORTIFY_SOURCE=2 -Wall -Wextra -Wundef -Wdeclaration-after-statement 
-Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 
-Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member 
-Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered 
-Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int 
-Wno-atomic-alignment
  OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 
USE_SYSTEMD=1 USE_PROMEX=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H 
-DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE +LIBCRYPT 
+LINUX_SPLICE +LINUX_TPROXY +LUA -MEMORY_PROFILING +NETFILTER +NS 
-OBSOLETE_LINKER +OPENSSL -OT -PCRE +PCRE2 +PCRE2_JIT -PCRE_JIT +POLL +PRCTL 
-PROCCTL +PROMEX -QUIC +RT +SLZ -STATIC_PCRE -STATIC_PCRE2 +SYSTEMD +TFO 
+THREAD +THREAD_DUMP +TPROXY -WURFL -ZLIB

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=64).
Built with OpenSSL version : OpenSSL 1.1.1f  31 Mar 2020
Running on OpenSSL version : OpenSSL 1.1.1f  31 Mar 2020
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with the Prometheus exporter as a service
Built with network namespace support.
Support for malloc_trim() is enabled.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), 
raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT 
IP_FREEBIND
Built with PCRE2 version : 10.34 2019-11-21
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 9.4.0

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : prometheus-exporter
Available filters :
       [CACHE] cache
       [COMP] compression
       [FCGI] fcgi-app
       [SPOE] spoe
       [TRACE] trace


I don’t see any errors being logged into systemd. Also no failed health checks 
(where I would understand broken connections).

What I already tried / where I initially come from:
Initially, the listener looked like this: “ssl alpn h2,http/1.1 crt 
/etc/ssl/haproxy/” and I was running HAProxy version “2.6.8-2ppa1~focal”.
There I first noticed this behavior. In order to debug, I then disabled http/2 
and upgraded to a newer HAProxy version. But this sadly did not have any impact.

Ah an in case you are wondering: I am running HAProxy on 4 Bare-Metal servers 
(that all show this behavior). They all run Ubuntu 20 and openssl version 
1.1.1f.

I ran “show sess” in an endless loop and it looks like, I only see sessions 
when a timeout happens. (Might be, because the connections are closed after 
every request):

0x7f8b38499130: proto=tcpv4 src=10.136.119.140:56940 fe=HTTP_MYDOMAIN 
be=HTTP_MYDOMAIN srv=super-server.mydomain.com ts=00 epoch=0x177 age=4s calls=1 
rate=0 cpu=0 lat=0 rq[f=49c40080h,i=0,an=8000h,rx=,wx=,ax=] 
rp[f=80400000h,i=0,an=1400000h,rx=9m59s,wx=,ax=] scf=[8,80h,fd=3824] 
scb=[8,11h,fd=4361] exp=9m59s rc=0 c_exp=

But I never see any session being listed for more than 1-2 seconds, which 
confuses me, as I would expect the

Any ideas? Do you need any dumps from my side for more/better insights?

Cheers
Marno

Reply via email to