GitHub user metegenez created a discussion: JVM crashes loading Go-based ADBC
drivers under sustained load
## Context
I've been working on an ADBC catalog connector for StarRocks (using
`JniDriverFactory` on the JVM side; not yet upstream). This report is from
running it under sustained TPC-H load against an Arrow Flight/MySQL target.
## TL;DR
Loading a Go-based ADBC driver (FlightSQL in my case) into a long-lived JVM via
`JniDriverFactory` and running sustained workloads crashes the JVM with one of
three signatures from what looks like the same bug class. I found a deployment
configuration that holds at >800 sequential ADBC ops, but I haven't touched the
JNI shim itself. Posting this as a field report and to ask whether anyone has
hit it / has guidance.
## Setup
- Host: StarRocks FE (long-lived JVM, server workload, hundreds of threads)
- Driver: `libadbc_driver_flightsql.so` loaded via
`org.apache.arrow.adbc.driver.jni.JniDriverFactory`
- Workload: TPC-H SF1 over Arrow Flight, 88 queries per benchmark run
- Behavior: crashes within tens of ADBC operations on stock build; stable past
+800 ops with the workarounds below
## Three crash signatures, looks like one bug class
1. `fatal error: found pointer to free object` in `runtime.bgsweep` — GC sweep
finds zombie heap entries
2. `SIGSEGV at addr=0x118` in `runtime.(*unwinder).next` — stack walker derefs
null when scanning a goroutine mid-cgo-callback
3. `runtime.memmove` in `bytes.(*Buffer).Write` from gRPC
`hpack.(*Encoder).WriteField` — gRPC `loopyWriter` writes into a freed Go
buffer while encoding HTTP/2 headers for a new stream
All three look like the Go runtime catching freed-but-still-referenced state.
Variation seems timing-driven: whichever goroutine happens to be running when
the corruption surfaces.
## What works (deployment hygiene only)
I have not modified the JNI shim. I pinned a build of the FlightSQL `.so` and
added two env vars at JVM launch:
```bash
LD_PRELOAD=$JAVA_HOME/lib/libjsig.so # JDK-shipped signal chaining
GODEBUG=asyncpreemptoff=1 # disable Go SIGURG-based preemption
```
I also added a caller-side connection pool so the JVM doesn't churn
`JniConnection` wrappers. With this combination I run 10× sustained TPCH
benchmark cycles (≈880 ADBC ops) cleanly. Without it, crash within tens of ops.
## Why I think upstream hasn't seen this
`JniDriverTest.java` covers happy-path open/close cycles only. No
sustained-load run, no `System.gc()` pressure between iterations. A bug at this
layer wouldn't surface there.
# Initial Performance on Starrocks Integration
We need to improve connection handling for ADBC and increase the scale factor
for putting more workload on data read, since most of the queries requires much
less data.
```
═══════════════════════════════════════════════════════════════════════════════════════════
MySQL JDBC vs ADBC Benchmark
Scale: sf1 | Queries: 22 | Runs: 10 (+1 warmup) | Timeout: 60s
═══════════════════════════════════════════════════════════════════════════════════════════
+-------+-----------------+-----------------+-------------+-----------------+-----------------+-------------+
| Query | JDBC total (ms) | ADBC total (ms) | Total ratio | JDBC scan (ms) |
ADBC scan (ms) | Scan ratio |
+-------+-----------------+-----------------+-------------+-----------------+-----------------+-------------+
| Q01 | 11445.0 | 14509.0 | 0.79 | 11458.2 |
14522.3 | 0.79 |
| Q02 | 473.2 | 796.1 | 0.59 | 89.9 |
159.2 | 0.58 |
| Q03 | 4576.9 | 4577.8 | 1.00 | 1857.2 |
1675.0 | 1.47 |
| Q04 | 7998.1 | 2770.9 | 2.89 | 3992.0 |
1377.8 | 2.13 |
| Q05 | 4010.3 | 7571.8 | 0.53 | 722.0 |
1314.0 | 0.76 |
| Q06 | 1151.9 | 1141.3 | 1.01 | 1139.8 |
1126.7 | 1.01 |
| Q07 | 2867.8 | 2776.1 | 1.03 | 568.5 |
544.4 | 0.86 |
| Q08 | 4543.1 | 7932.7 | 0.57 | 872.3 |
1410.9 | 0.75 |
| Q09 | 6200.2 | 11194.1 | 0.55 | 3854.1 |
461.3 | 23.81 |
| Q10 | 1448.4 | 2224.0 | 0.65 | 485.7 |
712.8 | 0.73 |
| Q11 | 539.0 | 867.1 | 0.62 | 154.1 |
268.1 | 0.56 |
| Q12 | 1530.3 | 1545.0 | 0.99 | 1051.1 |
1036.3 | 1.03 |
| Q13 | 848.4 | 789.9 | 1.07 | 423.6 |
392.6 | 1.08 |
| Q14 | 1120.5 | 1134.2 | 0.99 | 577.8 |
584.2 | 0.98 |
| Q15 | 1016.4 | 1116.3 | 0.91 | 663.3 |
728.8 | 0.89 |
| Q16 | 309.2 | 242.6 | 1.27 | 111.0 |
90.4 | 1.04 |
| Q17 | 3845.5 | 7834.3 | 0.49 | 1939.5 |
6397.4 | 0.25 |
| Q18 | 2744.4 | 5168.1 | 0.53 | 1964.1 |
3091.5 | 0.76 |
| Q19 | 1623.0 | 1677.1 | 0.97 | 63.1 |
1633.0 | 0.04 |
| Q20 | 1401.0 | 1366.4 | 1.03 | 356.9 |
344.4 | 0.85 |
| Q21 | 8226.9 | 2873.0 | 2.86 | 3152.3 |
1335.1 | 1.59 |
| Q22 | 391.2 | 379.3 | 1.03 | 80.8 |
165.7 | 0.48 |
+-------+-----------------+-----------------+-------------+-----------------+-----------------+-------------+
| AVG | 3105.0 | 3658.5 | 1.02 | 1617.1 |
1789.6 | 1.93 |
| GEOM | 1935.5 | 2160.0 | 0.90 | 698.4 |
838.3 | 0.86 |
+-------+-----------------+-----------------+-------------+-----------------+-----------------+-------------+
```
```
═══════════════════════════════════════════════════════════════════════════════════════════
StarRocks to Starrocks JDBC(MySQL) vs ADBC(FlightSQL) Benchmark
Scale: sf1 | Queries: 22 | Runs: 10 (+1 warmup) | Timeout: 120s
═══════════════════════════════════════════════════════════════════════════════════════════
+-------+-----------------+-----------------+-------------+-----------------+-----------------+-------------+
| Query | JDBC total (ms) | ADBC total (ms) | Total ratio | JDBC scan (ms) |
ADBC scan (ms) | Scan ratio |
+-------+-----------------+-----------------+-------------+-----------------+-----------------+-------------+
| Q01 | 11338.1 | 821.7 | 13.80 | 11305.0 |
810.6 | 13.95 |
| Q02 | 827.2 | 413.2 | 2.00 | 102.4 |
108.2 | 0.71 |
| Q03 | 4700.3 | 429.0 | 10.96 | 1849.1 |
189.1 | 7.01 |
| Q04 | 7311.6 | 354.7 | 20.61 | 3606.7 |
155.6 | 15.69 |
| Q05 | 4112.2 | 823.4 | 4.99 | 695.5 |
167.4 | 1.79 |
| Q06 | 261.5 | 127.6 | 2.05 | 197.4 |
85.5 | 2.31 |
| Q07 | 3045.2 | 482.5 | 6.31 | 568.3 |
122.8 | 2.71 |
| Q08 | 4728.1 | 772.3 | 6.12 | 851.5 |
163.9 | 2.47 |
| Q09 | 6121.0 | 1093.9 | 5.60 | 1224.6 |
132.6 | 9.23 |
| Q10 | 1255.9 | 367.4 | 3.42 | 340.5 |
146.6 | 1.75 |
| Q11 | 705.4 | 327.8 | 2.15 | 149.6 |
88.6 | 1.43 |
| Q12 | 650.2 | 196.9 | 3.30 | 319.3 |
102.2 | 2.83 |
| Q13 | 853.3 | 279.2 | 3.06 | 388.8 |
152.6 | 2.05 |
| Q14 | 245.4 | 155.1 | 1.58 | 100.0 |
79.2 | 1.32 |
| Q15 | 455.0 | 188.6 | 2.41 | 234.6 |
109.8 | 1.95 |
| Q16 | 418.5 | 221.3 | 1.89 | 98.1 |
103.9 | 0.83 |
| Q17 | 3884.5 | 649.4 | 5.98 | 1891.5 |
524.8 | 3.21 |
| Q18 | 2833.8 | 635.3 | 4.46 | 1942.6 |
348.2 | 5.04 |
| Q19 | 265.3 | 176.8 | 1.50 | 15.5 |
86.2 | 0.18 |
| Q20 | 1471.8 | 380.2 | 3.87 | 326.2 |
108.4 | 2.51 |
| Q21 | 8133.5 | 658.9 | 12.34 | 3015.9 |
283.3 | 7.12 |
| Q22 | 472.8 | 186.8 | 2.53 | 83.6 |
94.6 | 0.88 |
+-------+-----------------+-----------------+-------------+-----------------+-----------------+-------------+
| AVG | 2913.2 | 442.8 | 5.50 | 1332.1 |
189.3 | 3.95 |
| GEOM | 1530.2 | 370.4 | 4.13 | 462.5 |
150.7 | 2.46 |
+-------+-----------------+-----------------+-------------+-----------------+-----------------+-------------+
```
> I am not sure getting slower reads on ADBC MySQL driver, if there is any
> benchmark on arrow-adbc side, i can happly check that.
GitHub link: https://github.com/apache/arrow-adbc/discussions/4294
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]