Hello

We still have issue with connection to guacd even with the patch we suggest:
https://github.com/apache/guacamole-server/commit/9134780e347e4a75a0759a7970aaddfb0e8fa7de

Have got a stuck guacd daemon and my gdb gives me this:

Thread 4 is the parent and waits for thread 5:

|(gdb) thread ||4|
|[Switching to thread ||4| |(Thread ||0x7f2db2fd5700| |(LWP ||40612||))]|
|#||0| |0x00007f2e5c016495| |in __pthread_timedjoin_ex () from /lib/x86_64-linux-gnu/libpthread.so.||0|
|(gdb) bt|
|#||0| |0x00007f2e5c016495| |in __pthread_timedjoin_ex () from /lib/x86_64-linux-gnu/libpthread.so.||0| |#||1| |0x000055bcffd16fad| |in guacd_connection_io_thread (data=||0x7f2e2c00b6e0||) at connection.c:||150| |#||2| |0x00007f2e5c014fa3| |in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.||0| |#||3| |0x00007f2e5beb2eff| |in clone () from /lib/x86_64-linux-gnu/libc.so.||6|
|(gdb) frame ||1|
|#||1| |0x000055bcffd16fad| |in guacd_connection_io_thread (data=||0x7f2e2c00b6e0||) at connection.c:||150|
|print *(guacd_connection_io_thread_params* )data|
|$||5| |= {parser = ||0x7f2e2c002930||, socket = ||0x7f2e2c00b000||, fd = ||87|


Thread 5 seems stuck on read() call.

|thread ||5|
|[Switching to thread ||5| |(Thread ||0x7f2db27d4700| |(LWP ||40613||))]|
|#||0| |0x00007f2e5c01e544| |in read () from /lib/x86_64-linux-gnu/libpthread.so.||0|
|(gdb) bt|
|#||0| |0x00007f2e5c01e544| |in read () from /lib/x86_64-linux-gnu/libpthread.so.||0| |#||1| |0x00007f2e5c03ecab| |in ?? () from /lib/x86_64-linux-gnu/libguac.so.||23| |#||2| |0x000055bcffd16ec8| |in guacd_connection_write_thread (data=||0x7f2e2c00b6e0||) at connection.c:||123| |#||3| |0x00007f2e5c014fa3| |in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.||0| |#||4| |0x00007f2e5beb2eff| |in clone () from /lib/x86_64-linux-gnu/libc.so.||6|
|print *(guacd_connection_io_thread_params* )data|
|$||6| |= {parser = ||0x7f2e2c002930||, socket = ||0x7f2e2c00b000||, fd = ||87||}|


Looking code https://github.com/apache/guacamole-server/blob/3782339031bdc47e3c67c5630e42f1f2fd9493a0/src/guacd/connection.c#L143, I wonder how the code behave if the read() fails immediately (aka we don't push anything in with guac_socket_write()).

Maybe something is missing to unlock the writting thread or even better using some select and merge read/write operations in the same thread.


Regards

On 2024/02/09 09:23:27 michael böhm wrote:
> Hi everyone,
>
>
>
> I proceeded as Antoine proposed and set "ARG ALPINE_BASE_IMAGE=3.18" in
> staging/1.5.5 Dockerfile.
>
>
>
> The build worked and I was able to start the guacd container from this image. > I tried more than 100 consecutive reconnects to an RDP session without the
> issue appearing.
>
>
>
> So, it looks good to me. Can anyone confirm?
>
>
>
> Infos on my docker-host:
>
>
>
> NAME="Ubuntu"
> VERSION_ID="22.04"
> VERSION="22.04.3 LTS (Jammy Jellyfish)"
> VERSION_CODENAME=jammy
> ID=ubuntu
> ID_LIKE=debian
> HOME_URL="https://www.ubuntu.com/";
> SUPPORT_URL="https://help.ubuntu.com/";
> BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/";
> PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-
> policy"
>
>
>
> Kernel 5.15.0-92-generic
>
>
>
> Docker version 25.0.3, build 4debf41
>
>
>
> Thanks to everyone working on this.
>
>
>
> Best wishes
>
>
>
> Michael
>
>
>
> **Gesendet:** Freitag, 09. Februar 2024 um 09:00 Uhr
> **Von:** "Antoine Besnier" <be...@yahoo.fr.INVALID>
> **An:** "user@guacamole.apache.org" <us...@guacamole.apache.org>
> **Betreff:** Re: Aw: Re: Major bug message log in guacd 1.5.4
>
>
>
> Hi,
>
>
>
> On Alpine, openssl1.1-compat-dev is available for 3.17, 3.18 and Edge, but not > 3.19 (which is the version for the 'latest' tag). You could try by changing
> the version of Alpine.
>
>
>
> Cheers
>
> Antoine
>
>
>
>
>
> Le vendredi 9 fevrier 2024 a 07:35:42 UTC+1, michael bohm
> <ks...@gmx.net.invalid> a ecrit :
>
>
>
>
>
> Hi everyone
>
>
>
> I'd gladly test in our environment. However, the docker build does not work
> for me:
>
>
>
> /tmp/guacamole-server ‹staging/1.5.5› » git checkout staging/1.5.5
> 1 ↵
> Switched to branch 'staging/1.5.5'
> Your branch is up to date with 'origin/staging/1.5.5'.
> /tmp/guacamole-server ‹staging/1.5.5› » docker build -t guac_test .
> [+] Building 0.9s (6/13)
> docker:default
> => [internal] load build definition from Dockerfile
> 0.0s
> => => transferring dockerfile: 6.10kB
> 0.0s
> => [internal] load metadata for docker.io/library/alpine:latest
> 0.0s
> => [internal] load .dockerignore
> 0.0s
> => => transferring context: 681B
> 0.0s
> => CACHED [builder 1/5] FROM docker.io/library/alpine:latest
> 0.0s
> => [internal] load build context
> 0.0s
> => => transferring context: 28.84kB
> 0.0s
> => ERROR [builder 2/5] RUN apk add --no-cache autoconf
> automake build-base
> cairo-dev cmake
> git 0.8s
> \------
> > [builder 2/5] RUN apk add --no-cache autoconf
> automake build-base
> cairo-dev cmake
> git grep
> libjpeg-turbo-dev libpng-dev
> libtool libwebp-dev
> make openssl1.1-compat-dev
> pango-dev pulseaudio-dev
> util-linux-dev:
> 0.285 fetch <https://dl-
> cdn.alpinelinux.org/alpine/v3.19/main/x86_64/APKINDEX.tar.gz>
> 0.475 fetch <https://dl-
> cdn.alpinelinux.org/alpine/v3.19/community/x86_64/APKINDEX.tar.gz>
> 0.718 ERROR: unable to select packages:
> 0.718 openssl1.1-compat-dev (no such package):
> 0.718 required by: world[openssl1.1-compat-dev]
> \------
> Dockerfile:29
> \--------------------
> 28 | # Install build dependencies
> 29 | >>> RUN apk add --no-cache \
> 30 | >>> autoconf \
> 31 | >>> automake \
> 32 | >>> build-base \
> 33 | >>> cairo-dev \
> 34 | >>> cmake \
> 35 | >>> git \
> 36 | >>> grep \
> 37 | >>> libjpeg-turbo-dev \
> 38 | >>> libpng-dev \
> 39 | >>> libtool \
> 40 | >>> libwebp-dev \
> 41 | >>> make \
> 42 | >>> openssl1.1-compat-dev \
> 43 | >>> pango-dev \
> 44 | >>> pulseaudio-dev \
> 45 | >>> util-linux-dev
> 46 |
> \--------------------
> ERROR: failed to solve: process "/bin/sh -c apk add --no-cache
> autoconf automake
> build-base cairo-dev
> cmake git
> grep libjpeg-turbo-dev
> libpng-dev libtool
> libwebp-dev make
> openssl1.1-compat-dev pango-dev
> pulseaudio-dev util-linux-dev" did not complete
> successfully: exit code: 1
>
>
>
> It seems that "openssl1.1-compat-dev" is not present in base image
> alpine:latest's repositories. Am I doing something wrong? On the master branch
> I can build the image just fine.
>
>
>
> Thanks and best wishes
>
>
>
> Michael
>
>
>
> **Gesendet:** Freitag, 09. Februar 2024 um 02:59 Uhr
> **Von:** "Michael Jumper" <mj...@apache.org>
> **An:** user@guacamole.apache.org
> **Betreff:** Re: Major bug message log in guacd 1.5.4
>
> On 2/7/24 05:24, Nick Couchman wrote:
> > On Wed, Feb 7, 2024 at 6:40 AM Yannick Martin
> > <yannick.mar...@ovhcloud.com <ma...@ovhcloud.com>> wrote:
> >
> > Hello
> >
> > About pthread_keys leak, I wonder if
> > <https://github.com/apache/guacamole-
> > server/blob/master/src/libguac/client.c#L299>
> > <<https://github.com/apache/guacamole-
> > server/blob/master/src/libguac/client.c#L299>>
> > and L300 is not a duplicate call of those done in:
> > guac_rwlock_init(&(client->__users_lock));
> > guac_rwlock_init(&(client->__pending_users_lock));
> >
> > which call pthread_key_create too =>
> > <https://github.com/apache/guacamole-
> > server/blob/master/src/libguac/rwlock.c#L52>
> > <<https://github.com/apache/guacamole-
> > server/blob/master/src/libguac/rwlock.c#L52>>
> >
> >
> > Two issues with this:
> > * I'm not sure that duplicating a call to pthread_key_create()
> > would/should result in the behavior we're seeing - where TLS-based
> > connections fail after a certain, relatively well-defined number (58-60).
> > * This also would not explain why this only occurs in certain
> > situations, on certain platforms - that is, the same exact libguac code
> > running on EL7 (RHEL, CentOS, etc.) does not result in the resource
> > leak, whereas it does on some other set of platforms (Debian, Alpine,
> > EulerOS). Unless the pthread library has been changed substantially
> > between those versions to not clean up after itself?
> >
>
> This *might* now be fixed on master and staging/1.5.5, if folks want to
> take a look. The issue does appear to be have been the duplicate
> pthread_key_create() calls noted above (pending confirmation with testing).
>
> If this is the cause, it's still unclear why the behavior varies between
> platforms. It might be a matter of varying implementations, different
> resource limits, or something else platform-specific.
>
> \- Mike
>
> \---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@guacamole.apache.org
> For additional commands, e-mail: user-h...@guacamole.apache.org
>
>
> \--------------------------------------------------------------------- To
> unsubscribe, e-mail: user-unsubscr...@guacamole.apache.org For additional
> commands, e-mail: user-h...@guacamole.apache.org
>
> \--------------------------------------------------------------------- To
> unsubscribe, e-mail: user-unsubscr...@guacamole.apache.org For additional
> commands, e-mail: user-h...@guacamole.apache.org
>
>

--
Regards,

Yannick Martin
Digital Core - Business Critical Infrastructure - IaaS

Reply via email to