Hello,
I'm a happy user of HAProxy and so far have been able to resolve any
issues I've had by reading the docs and following the answers given on
this mailing list or online tutorials. But now I've run into a problem I
cannot resolve myself and hope someone could help me figure out what
might be wrong.
The setup I have has been running fine for many months now. HAProxy
terminates TLS for HTTPS requests and forwards these requests to a
couple of backends. The backend servers are HTTP/1.1 only. As long as
the frontend is limited to HTTP/1.1, the special backend I use for file
uploads is operating exactly as intended. After enabling HTTP/2 on the
frontend, however, the file upload backends are not working as before.
The requests will not be processed properly and run into timeouts.
These file upload backends in my HAProxy configuration are somewhat
special. They try to "serialize" certain HTTP file upload requests made
by browser clients via AJAX calls (i.e. drag-and-drop of several files
at once into the browser window). These files need to be processed one
after the other per client (not in parallel). So the HAProxy backend in
question uses a number of servers with a "maxconn 1" setting each, which
will process the first request immediately but queue subsequent HTTP
requests coming in at the same time until the previous request is
finished. This approach certainly is not perfect in design, but has been
working for me when using a somewhat high arbitrary number of
pseudo-servers to realize it, so that each client making these file
upload requests will be served by one "server" exclusively. This is what
the backend definition looks like:
backend upload_ananas
option forwardfor if-none header X-Client-IP # except 127.0.0.0/8
stick-table type string len 32 size 10k expire 2m
stick match req.cook(mycookie),url_dec
stick store-request req.cook(mycookie),url_dec
timeout server 10m
timeout queue 20s
balance hdr(Cookie)
default-server no-check maxconn 1 maxqueue 20 send-proxy-v2 track
xy/ananas source "ipv6@[${BACKENDUSEIPV6}]"
server-template a 32 "${ANANAS}":9876
Once I switched the HTTP frontend to use HTTP/2 (using "alpn
h2,http/1.1"), this special backend is not working as expected anymore.
All is fine as long as there is only one request present for a certain
server at any given time. However, when there are two or more requests
at the same time, i.e. as soon as the queueing mechanism is supposed to
kick in, the setup is not working the way it does with a HTTP/1.1
frontend. The parallel requests aren't properly put into the queue (or
taken out of the queue) in this case. From what I can see in the log
file, the requests seem to be blocking one another, and nothing is
happening until the timeout set by "timeout queue" is reached. At that
point, 1 or 2 out of 4 requests in an example call may succeed, but the
others will fail.
Clearly, this kind of setup is quite the opposite of what most people
will be using. In my case, I'm deliberately trying to stuff requests
into a queue, whereas normally, one would try to move requests to a
server that has got slots open for processing. So I think that my use
case is hitting different code paths than most other setups.
I've read in previous emails on the mailing list that the "maxconn"
setting nowadays does not limit the number of TCP sessions to the
backend server, but the number of parallel HTTP requests. This made we
wonder if the trouble I'm seeing might have to do with the way
multiplexed HTTP/2 requests are mapped to HTTP/1.1 backends. Could it be
that when the backend server finishes processing the first request,
this isn't generating a proper event in HAProxy's backend logic, so that
the next request is not being processed when it could? Or maybe there is
something special about the number of "0" free slots in the server
definition in this case, once the first slot has been taken?
Trying to work around the problem, I've switched on and off quite a few
settings that may influence the way processing takes place, but still
haven't been able to come up with a working configuration in the HTTP/2
case. Some of the settings I tried were:
* default-server max-reuse 0
* http-reuse never
* option http-server-close
* option httpclose
* option http-buffer-request
* retry-on conn-failure 408 503
* http-request wait-for-body time 15s at-least 16k if METH_POST
With HTTP/2 active, there will be log entries like this (I re-ordered
them to be in order of begin of processing).
Mar 18 17:30:31 localhost haproxy[478250]: fd90:1234::21a:52738
[18/Mar/2022:17:29:51.411] Loadbalancer~ zyx_ananas/a24
12/0/1/40112/40125 200 1217 mycookie=3BRoK6tyijmqndBJRzLyT9Lq7dsiPmeT -
---- 356/356/1/0/0 0/0 {serialize.example.com} "POST
https://serialize.example.com/services/ajax.php/file/upload HTTP/2.0"
Mar 18 17:30:11 localhost haproxy[478250]: fd90:1234::21a:52738
[18/Mar/2022:17:29:51.436] Loadbalancer~ zyx_ananas/a24
26/20015/-1/-1/20041 503 9228 mycookie=3BRoK6tyijmqndBJRzLyT9Lq7dsiPmeT
- sQ-- 326/326/1/1/0 1/0 {serialize.example.com} "POST
https://serialize.example.com/services/ajax.php/file/upload HTTP/2.0"
Mar 18 17:30:31 localhost haproxy[478250]: fd90:1234::21a:52738
[18/Mar/2022:17:30:11.476] Loadbalancer~ zyx_ananas/a24
0/20002/-1/-1/20002 503 9228 mycookie=3BRoK6tyijmqndBJRzLyT9Lq7dsiPmeT -
sQ-- 355/355/1/1/0 1/0 {serialize.example.com} "POST
https://serialize.example.com/services/ajax.php/file/upload HTTP/2.0"
Mar 18 17:30:26 localhost haproxy[478250]: fd90:1234::21a:52738
[18/Mar/2022:17:30:11.477] Loadbalancer~ zyx_ananas/<NOSRV>
-1/-1/-1/-1/15001 408 0 - - cD-- 356/356/2/0/3 0/0
{serialize.example.com} "POST
https://serialize.example.com/services/ajax.php/file/upload HTTP/2.0"
Mar 18 17:30:31 localhost haproxy[478250]: fd90:1234::21a:52738
[18/Mar/2022:17:30:31.478] Loadbalancer~ zyx_ananas/a24 0/58/1/44/103
200 1218 mycookie=3BRoK6tyijmqndBJRzLyT9Lq7dsiPmeT - ---- 356/356/0/0/0
1/0 {serialize.example.com} "POST
https://serialize.example.com/services/ajax.php/file/upload HTTP/2.0"
Note how the log line with code 408 does not contain the captured
session cookie, like the other lines do. That request is most likely
hitting the "wait-for-body" 15 second limit. The 0 byte answer on that
line is due to a configuration line reading "errorfile 408 /dev/null"
(as suggested by
<https://www.haproxy.com/blog/haproxy-and-http-errors-408-in-chrome/>).
Also note how most of the request durations is aligned to certain round
numbers (15 seconds, 20 seconds, 40 seconds) that represent
configuration timeout settings.
On a successful upload, that is when the server's frontend is limited to
HTTP/1.1, the log entries look like this, showing how queueing takes
place correctly:
Mar 18 20:41:12 localhost haproxy[488918]: fd90:1234::21a:56502
[18/Mar/2022:20:41:09.253] Loadbalancer~ zyx_ananas/a24 0/0/1/2868/2869
200 1214 mycookie=39Lq7zLRmBoK6tyTjyidsnmedBiPqTJR - ---- 81/81/3/0/0
0/0 {deppmail.uni-koeln.de} "POST /services/ajax.php/file/upload HTTP/1.1"
Mar 18 20:41:14 localhost haproxy[488918]: fd90:1234::21a:56503
[18/Mar/2022:20:41:09.291] Loadbalancer~ zyx_ananas/a24
0/2831/0/2119/4950 200 1215 mycookie=39Lq7zLRmBoK6tyTjyidsnmedBiPqTJR -
---- 85/85/2/0/0 1/0 {deppmail.uni-koeln.de} "POST
/services/ajax.php/file/upload HTTP/1.1"
Mar 18 20:41:16 localhost haproxy[488918]: fd90:1234::21a:56498
[18/Mar/2022:20:41:09.805] Loadbalancer~ zyx_ananas/a24
0/4435/1/2475/6911 200 1216 mycookie=39Lq7zLRmBoK6tyTjyidsnmedBiPqTJR -
---- 79/79/1/1/0 2/0 {deppmail.uni-koeln.de} "POST
/services/ajax.php/file/upload HTTP/1.1"
Mar 18 20:41:19 localhost haproxy[488918]: fd90:1234::21a:56501
[18/Mar/2022:20:41:09.807] Loadbalancer~ zyx_ananas/a24
0/6910/0/2581/9491 200 1216 mycookie=39Lq7zLRmBoK6tyTjyidsnmedBiPqTJR -
---- 76/76/0/0/0 3/0 {deppmail.uni-koeln.de} "POST
/services/ajax.php/file/upload HTTP/1.1"
Note how the client's port numbers are different on each request in the
HTTP/1.1 example, showing that a TLS handshake took place for each
upload request, resulting in a few milliseconds delay. This is in
contrast to the HTTP/2 example, where only one HTTP client connection is
used (as should be).
So, does anyone have an idea what could be wrong in the HTTP/2 case and
how I could fix it?
I'm running the latest 2.4 version of HAProxy, 2.4.15, but would be
willing to try a newer version if that helps to rule out certain bugs
fixed recently. Of course I can also provide further configuration
information if needed. I didn't want to include all of it right here, as
this email has grown rather lengthy already, as I tried to explain all
the subtleties of the HAProxy setup in use.
Thanks,
Jens