On Mon, Aug 11, 2025 at 1:00 PM Dilip Modi <[email protected]> wrote:
> > Hello Guacamole Dev Team, > > I am writing to report a persistent crash issue we are experiencing > with guacd under load. We have been working to debug this for a while and > have applied several fixes that have improved stability, but we are still > seeing one final, intermittent crash. > > *Summary of the Issue* > > guacd crashes with a SIGABRT signal, originating > from __pthread_kill_implementation(), when handling a high volume of > concurrent RDP sessions (around 300). The crash occurs in a generic FreeRDP > worker thread, which strongly suggests heap corruption caused by a race > condition or memory bug elsewhere in the application. > > We are using 16 Core, 128 GB system. > > *Environment* > > - *Guacamole Server Version:* 1.6.0 > - *FreeRDP Version:* 2.11.0 > - *Operating System:* RHEL 9 on x86_64 > - *Build:* Custom build using GCC 12. > > *Latest Crash Backtrace* > > Here is the backtrace from the most recent crash. The crash location has > moved from the RDP disconnect logic to a generic worker thread after our > previous fixes. > > > > Program terminated with signal SIGABRT, Aborted. > > #0 0x00007f67e988bedc in __pthread_kill_implementation () from > /usr/lib64/libc.so.6 > > [Current thread is 1 (Thread 0x7f646c598640 (LWP 1496945))] > > > === bt === > > > #0 0x00007f67e988bedc in __pthread_kill_implementation () from > /usr/lib64/libc.so.6 > > #1 0x00007f67e983eb46 in raise () from /usr/lib64/libc.so.6 > > #2 0x00007f67e9828833 in abort () from /usr/lib64/libc.so.6 > > #3 0x00007f67e9829172 in __libc_message.cold () from /usr/lib64/libc.so.6 > > #4 0x00007f67e9895f87 in malloc_printerr () from /usr/lib64/libc.so.6 > > #5 0x00007f67e9897c70 in _int_free () from /usr/lib64/libc.so.6 > > #6 0x00007f67e989a2c5 in free () from /usr/lib64/libc.so.6 > > #7 0x00007f67e0465507 in BufferPool_Clear () from > /opt/zscaler/lib64/libwinpr2.so.2 > > #8 0x00007f67e04656f6 in BufferPool_Free () from > /opt/zscaler/lib64/libwinpr2.so.2 > > #9 0x00007f67e06bf71f in rfx_context_free () from > /opt/zscaler/lib64/libfreerdp2.so.2 > > #10 0x00007f67e0640003 in codecs_free () from > /opt/zscaler/lib64/libfreerdp2.so.2 > > #11 0x00007f67e0648c3d in rdp_client_disconnect () from > /opt/zscaler/lib64/libfreerdp2.so.2 > > #12 0x00007f67e0639207 in freerdp_disconnect () from > /opt/zscaler/lib64/libfreerdp2.so.2 > > #13 0x00007f67e07be54e in guac_rdp_handle_connection > (client=0x7f67d4005870) at rdp.c:676 > > #14 guac_rdp_client_thread (data=0x7f67d4005870) at rdp.c:944 > > #15 0x00007f67e988a19a in start_thread () from /usr/lib64/libc.so.6 > > #16 0x00007f67e990f210 in clone3 () from /usr/lib64/libc.so.6 > > > > *Analysis and Troubleshooting Steps Taken* > > Our investigation points towards a memory corruption issue, likely a race > condition exposed by the high rate of connection setup and teardown. The > logs around the time of the crash show many "Handshake failed, 'connect' > instruction was not received" errors, indicating this high churn. > > We have progressively identified and fixed several bugs: > > 1. *Incorrect Cleanup Order:* Initially, we found > that freerdp_disconnect() was called before gdi_free(), which we corrected. > > > *Request for Help* > > We would greatly appreciate it if the community could review our analysis > and the suspected root cause. > > - Does the analysis of the race condition in print-job.c seem correct? > - Are there any other known issues or areas of the code we should > investigate that could cause this type of heap corruption under heavy load? > > We are happy to provide more detailed logs, code snippets, or run further > tests as needed. > > Thank you for your time and assistance. > > > It'd be great if you could submit a Jira ticket for this (you'll need to request a Jira account, first, which you can do at the main Jira page), and then create a pull request to fix this. https://issues.apache.org/jira/browse/GUACAMOLE https://guacamole.apache.org/open-source/ -Nick
