Bug#1104002: mariadb: FTBFS on sparc64: new LLM embedding / vector functions crash on sparc64
On 10/12/2025 08:05, Otto Kekäläinen wrote: I ran a detailed stack trace on stadler.debian.net and posted it in upstream Jira at https://jira.mariadb.org/browse/MDEV-36670 in hopes upstream could gain valuable insights from reading it. I also posted there my notes on the exact steps I ran on stadler to build MariaDB and run mariadb-test-run in case someone wants to repeat it. I can also post the same thing here if you think it is valuable for Debian bug tracking. I'm clueless about the initial stack trace with the SIGUSR1s, but the SIGILL one you posted further below seems very odd and somewhat unrelated, but still serious. _sparcv9_random is from OpenSSL, and while I can't find the exact version, I think it matches this piece of code: https://github.com/openssl/openssl/blob/0e9725bcb90770d967351b977407b174bbd91869/crypto/sparccpuid.S#L347-L353 Not sure if there's an issue with the hardcoded opcode, or perhaps an incorrectly identified CPU type, but the .size line seems off. Shouldn't this read ".size _sparcv9_random,.-_sparcv9_random" instead?
Bug#1104002: mariadb: FTBFS on sparc64: new LLM embedding / vector functions crash on sparc64
I ran a detailed stack trace on stadler.debian.net and posted it in upstream Jira at https://jira.mariadb.org/browse/MDEV-36670 in hopes upstream could gain valuable insights from reading it. I also posted there my notes on the exact steps I ran on stadler to build MariaDB and run mariadb-test-run in case someone wants to repeat it. I can also post the same thing here if you think it is valuable for Debian bug tracking.
Bug#1104002: mariadb: FTBFS on sparc64: new LLM embedding / vector functions crash on sparc64
X-Debbugs-CC: [email protected] User: [email protected] Usertags: sparc64 Looking at the most recent MariaDB builds on at https://buildd.debian.org/status/package.php?p=mariadb seems x32 is failing on these same vector functions. In the sparc64 build they are skipped, but the post-build MTR test is failing on other tests crashing: ... main.delete_use_source_engines w13 [ pass ] 32203 main.partition_innodbw3 [ pass ] 77352 main.update_use_source w21 [ pass ] 52894 main.update_innodb w30 [ pass ] 35943 main.parser_bug21114_innodb w16 [ pass ] 133529 main.long_unique_innodb w17 [ retry-fail ] Test ended at 2025-11-08 21:44:11 CURRENT_TEST: main.long_unique_innodb mysqltest: At line 165: query 'update ignore t1 set f = 'x'' failed with wrong errno (2013): 'Lost connection to server during query', instead of ER_NOT_SUPPORTED_YET (1235)... ... Completed: Failed 6/1135 tests, 99.47% were successful. Failing test(s): main.long_unique_innodb main.alter_table_lock main.unsafe_binlog_innodb ... This bug report https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104002 has steps to reproduce how I tested sparc64 builds and MTR on stadler.debian.net myself in case somebody has bandwidth to try to make MariaDB run well on sparc64.
Bug#1104002: mariadb: FTBFS on sparc64: new LLM embedding / vector functions crash on sparc64
Maybe the USR1 signal is from the test runner that kills the process after it has been stuck for a long time. I observed the first tests run quickly but then it grinds to a halt. But it could just be because of hitting the bug that traps stops/crashes server without returning control. I didn't install all symbols as I don't have the skills to read the full code path / stack trace and write a patch for sparc64 compatibility myself. I am not debugging this further right now, need to debug upgrade issues that affects all platforms.
Bug#1104002: mariadb: FTBFS on sparc64: new LLM embedding / vector functions crash on sparc64
Core was generated by `/build/reproducible-path/mariadb-11.8.1/builddir/sql/mariadbd --defaults-group-suffix=.1 --defaults-file=/build/reproducible-path/mariadb-11.8.1/builddir/mysql-test/var/3/my.cnf --log-output=file --core-file --loose-debug-sync-timeout=300'. Program terminated with signal SIGUSR1, User defined signal 1. #0 0xfff8000102b951b0 in ?? () from /lib/sparc64-linux-gnu/libc.so.6 [Current thread is 1 (Thread 0xfff800010d9428e0 (LWP 2910561))] #0 0xfff8000102b951b0 in ?? () from /lib/sparc64-linux-gnu/libc.so.6 #1 0x01aef804 in handle_fatal_signal (sig=10) at ./sql/signal_handler.cc:298 #2 #3 0x7265636f76657261 in ?? () I think you're missing libc6-dbg (and possibly some other debugging symbols). I'm also surprised by the SIGUSR1. Is this really a crash?
Bug#1104002: mariadb: FTBFS on sparc64: new LLM embedding / vector functions crash on sparc64
Hi! Yes, I tried the build on the porterbox and it is crashing, and repeated runs seem to crash on the same tests, but not exactly every time. Completed: Failed 14/1131 tests, 98.76% were successful. Failing test(s): main.bind_address_resolution main.bind_multiple_addresses_resolution main.log_slow_always_query_time main.connect main.set_statement main.unsafe_binlog_innodb Completed: Failed 10/1130 tests, 99.12% were successful. Failing test(s): main.bind_multiple_addresses_resolution main.bind_address_resolution main.partition_innodb_semi_consistent main.subselect_no_mat main.sp-i_s_columns - Notes to self on how to build on stadler.debian.net: ssh ssh stadler.debian.net source porterbox.sh pend # remove any old schroots psetup # create a new schroot pinstall git-buildpackage gdb debian-goodies libipc-system-simple-perl # install prerequisites pchroot # enter schroot gbp clone --verbose vcs-git:mariadb # gbp not on host, must run from inside schroot exit pdeps mariadb # install build dependencies pchroot # enter again to start actual build cd mariadb gbp buildpackage --git-verbose --git-no-pristine-tar -us -uc # rerun test after build when artifacts are still around using # commands copied from buildlog cd builddir/mysql-test ./mtr --force --testcase-timeout=120 --suite-timeout=540 --retry=3 \ --verbose-restart --max-save-core=1 --max-save-datadir=1 \ --parallel=48 --skip-rpl --suite=main \ --skip-test-list=/tmp/tmp.MTnxYcCghC
Bug#1104002: mariadb: FTBFS on sparc64: new LLM embedding / vector functions crash on sparc64
Hi Otto, On Thu, 2025-04-24 at 07:29 -0700, Otto Kekäläinen wrote: > Thanks for the tips, I relayed suggestion about GCC Compile Farm to upstream. > > Also I noticed that the sparc64 crashes are pretty random, happening > in different tests. Maybe the runner is unstable? > > For example in > https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=sparc64&ver=1%3A11.8.1-4&stamp=1745481491&raw=0 > the failing tests were: > > main.status main.vector2 main.query_cache > main.set_statement_notembedded main.ssl_timeout main.log_state > main.userstat main.backup_lock_binlog main.lock_multi main.xa > > In > https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=sparc64&ver=1%3A11.8.1-4&stamp=1745449363&raw=0 > the failing tests were: > > main.vector2 main.concurrent_innodb_safelog main.sp-innodb main.check Did you try reproducing the issue on stadler.debian.net? If it's an issue with the host machine, you should be able to run the tests successfully on the porterbox. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer `. `' Physicist `-GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Bug#1104002: mariadb: FTBFS on sparc64: new LLM embedding / vector functions crash on sparc64
Thanks for the tips, I relayed suggestion about GCC Compile Farm to upstream. Also I noticed that the sparc64 crashes are pretty random, happening in different tests. Maybe the runner is unstable? For example in https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=sparc64&ver=1%3A11.8.1-4&stamp=1745481491&raw=0 the failing tests were: main.status main.vector2 main.query_cache main.set_statement_notembedded main.ssl_timeout main.log_state main.userstat main.backup_lock_binlog main.lock_multi main.xa In https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=sparc64&ver=1%3A11.8.1-4&stamp=1745449363&raw=0 the failing tests were: main.vector2 main.concurrent_innodb_safelog main.sp-innodb main.check
Bug#1104002: mariadb: FTBFS on sparc64: new LLM embedding / vector functions crash on sparc64
Hi Otto, On Wed, 2025-04-23 at 13:32 -0700, Otto Kekäläinen wrote: > I reported this to upstream, who is willing to look into it, but have > no sparc64 access. > > Thus I was hoping some of the Debian sparc64 porters could step in and > test a bit, and help upstream, as they seem responsive to fixing > sparc64. I don't have the time to go through this whole bug report now, but FYI there is a very fast sparc64 (SPARC M8 clocked at 5 GHz) machine available running Solaris 11.4 in the GCC Compile Farm for which any open source developer can request an account for, see: - https://gcc.gnu.org/wiki/CompileFarm - https://portal.cfarm.net/machines/list/ We also have a sparc64 machine there running Debian unstable, but it's currently offline due to CPU problems. However, we have now finally purchased replacement CPUs and we are confident to get the machine back online soon. If upstream is really willing to work on sparc64 and they want to get the bug fixed as soon as possible, there would also be the possibility to create a DSA guest account to gain access to stadler.debian.net which is a SPARC T4 running Debian unstable. See: https://dsa.debian.org/doc/guest-account/ Since you're a DD, you can request a DSA guest account for them. They will need to specify what architectures they need access to. Access to other architectures is available as well, also through the GCC Compile Farm. I highly recommend the MariaDB people to request a GCC Compile Farm account if they haven't done so yet. I assume they would be interested to test their software on the various architectures offered there as well. And maybe they can also just use the Solaris 11.4 SPARC M8 to fix the sparc64 crashes. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer `. `' Physicist `-GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Bug#1104002: mariadb: FTBFS on sparc64: new LLM embedding / vector functions crash on sparc64
Package: mariadb Version: 1:11.8.1-1 Tags: ftbfs X-Debbugs-CC: [email protected] User: [email protected] Usertags: sparc64 Forwarded: https://jira.mariadb.org/browse/MDEV-36670 **REQUESTING HELP FROM SPARC64 PORTERS IN DEBIAN** In a build of MariaDB 11.8.1 on official Debian builders for sparc64 multiple vector related tests crashed the server: * main.mariadb-import * main.vector * main.vector_aria * main.vector_funcs * main.vector_innodb * main.vector2_notembedded These are all related to the new AI embeddings functionality in MariaDB (https://mariadb.com/kb/en/vectors/), and is new code that has never been tested on sparc64 before. I reported this to upstream, who is willing to look into it, but have no sparc64 access. Thus I was hoping some of the Debian sparc64 porters could step in and test a bit, and help upstream, as they seem responsive to fixing sparc64. Full log at https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=sparc64&ver=1%3A11.8.1-3&stamp=1745371074&raw=0 Example error: main.mariadb-import w13 [ fail ] Test ended at 2025-04-23 01:10:56 CURRENT_TEST: main.mariadb-import mysqltest: At line 112: query 'insert vec(v) values (x'e360d63ebe554f3fcdbc523f4522193f5236083d'), (x'f511303f72224a3fdd05fe3eb22a133ffae86a3f'), (x'f09baa3ea172763f123def3e0c7fe53e288bf33e'), (x'b97a523f2a193e3eb4f62e3f2d23583e9dd60d3f'), (x'f7c5df3e984b2b3e65e59d3d7376db3eac63773e'), (x'de01453ffa486d3f10aa4d3fdd66813c71cb163f'), (x'76edfc3e4b57243f10f8423fb158713f020bda3e'), (x'56926c3fdf098d3e2c8c5e3d1ad4953daa9d0b3e'), (x'7b713f3e5258323f80d1113d673b2b3f66e3583f'), (x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e')' failed: (2013): Lost connection to server during query main.vector2_notembedded w2 [ fail ] Test ended at 2025-04-23 01:13:42 CURRENT_TEST: main.vector2_notembedded mysqltest: At line 36: query 'insert into t1 select 0x as v from seq_1_to_1000' failed: (2013): Lost connection to server during query main.vector_innodb w2 [ fail ] Test ended at 2025-04-23 01:14:47 CURRENT_TEST: main.vector_innodb mysqltest: At line 7: query 'insert t1 (v) values (x'106d263fdf68ba3eb08d533f97d46e3fd1e1ec3edc4c123f984c563f621a233f'), (x'd55bee3c56eb9e3e84e3093f838dce3eb7cd653fe32d7d3f12de133c5715d23e'), (x'fcd5553f3822443f5dae413f2593493f363f5f7f113ebf12373d4d145a3f'), (x'7493093fd9a27d3e9b13783f8c66653f0bd7d23e50db983d251b013f1dba133f'), (x'2e30373fae331a3eba94153ee32bce3e3311b33d5bc75d3f6c25653eb769113f'), (x'381d5f3f2781de3e4f011f3f9353483f9bb37e3edd622d3eabecb63ec246953e'), (x'4ee5dc3e214b103f0e7e583f5f36473e79d7823ea872ec3e3ab2913d1b84433f'), (x'8826243f7d20f03e5135593f83ba653e44572d3fa87e8e3e943e0e3f649a293f'), (x'3859ac3e7d21823ed3f5753fc79c143e61d39c3cee39ba3eb0b0133e815c173f'), (x'cff0d93c32941e3f64b22a3f1e4f083f4ea2563fbff4a63e12a4703f6c824b3f')' failed: (2013): Lost connection to server during query main.vector_aria w3 [ fail ] Test ended at 2025-04-23 01:16:00 CURRENT_TEST: main.vector_aria mysqltest: At line 6: query 'alter table t add vector(f)' failed: (2013): Lost connection to server during query main.vector_funcsw3 [ fail ] Test ended at 2025-04-23 01:17:15 CURRENT_TEST: main.vector_funcs mysqltest: At line 3: query 'insert t1 (v) values (x'e360d63ebe554f3fcdbc523f4522193f5236083d'), (x'f511303f72224a3fdd05fe3eb22a133ffae86a3f'), (x'f09baa3ea172763f123def3e0c7fe53e288bf33e'), (x'b97a523f2a193e3eb4f62e3f2d23583e9dd60d3f'), (x'f7c5df3e984b2b3e65e59d3d7376db3eac63773e'), (x'de01453ffa486d3f10aa4d3fdd66813c71cb163f'), (x'76edfc3e4b57243f10f8423fb158713f020bda3e'), (x'56926c3fdf098d3e2c8c5e3d1ad4953daa9d0b3e'), (x'7b713f3e5258323f80d1113d673b2b3f66e3583f'), (x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e')' failed: (2013): Lost connection to server during query The result from queries just before the failure was: create table t1 (id int auto_increment primary key, v vector(5) not null, vector index (v)); insert t1 (v) values (x'e360d63ebe554f3fcdbc523f4522193f5236083d'), (x'f511303f72224a3fdd05fe3eb22a133ffae86a3f'), (x'f09baa3ea172763f123def3e0c7fe53e288bf33e'), (x'b97a523f2a193e3eb4f62e3f2d23583e9dd60d3f'), (x'f7c5df3e984b2b3e65e59d3d7376db3eac63773e'), (x'de01453ffa486d3f10aa4d3fdd66813c71cb163f'), (x'76edfc3e4b57243f10f8423fb158713f020bda3e'), (x'56926c3fdf098d3e2c8c5e3d1ad4953daa9d0b3e'), (x'7b713f3e5258323f80d1113d673b2b3f66e3583f'), (x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e'); - found 'core.2910524' (1/1) Core generated by '/build/reproducible-path/mariadb-11.8.1/builddir/sql/mariadbd' Output from gdb follows. The first stack trace is from the failing thread. The following stack traces are from all threads (so the failing one is duplicated). -- [New LWP 2910561] [New LWP 2910550] [New LWP 2910552] [New LWP 2910555] [New LWP 2910524] [New LWP 2910553] This GDB supports auto-do

