Bug#1104002: mariadb: FTBFS on sparc64: new LLM embedding / vector functions crash on sparc64

2025-12-10 Thread Gregor Riepl

On 10/12/2025 08:05, Otto Kekäläinen wrote:

I ran a detailed stack trace on stadler.debian.net and posted it in
upstream Jira at https://jira.mariadb.org/browse/MDEV-36670 in hopes
upstream could gain valuable insights from reading it. I also posted
there my notes on the exact steps I ran on stadler to build MariaDB
and run mariadb-test-run in case someone wants to repeat it. I can
also post the same thing here if you think it is valuable for Debian
bug tracking.


I'm clueless about the initial stack trace with the SIGUSR1s, but the SIGILL 
one you posted further below seems very odd and somewhat unrelated, but still 
serious.

_sparcv9_random is from OpenSSL, and while I can't find the exact version, I 
think it matches this piece of code:
https://github.com/openssl/openssl/blob/0e9725bcb90770d967351b977407b174bbd91869/crypto/sparccpuid.S#L347-L353

Not sure if there's an issue with the hardcoded opcode, or perhaps an incorrectly 
identified CPU type, but the .size line seems off. Shouldn't this read ".size 
_sparcv9_random,.-_sparcv9_random" instead?



Bug#1104002: mariadb: FTBFS on sparc64: new LLM embedding / vector functions crash on sparc64

2025-12-09 Thread Otto Kekäläinen
I ran a detailed stack trace on stadler.debian.net and posted it in
upstream Jira at https://jira.mariadb.org/browse/MDEV-36670 in hopes
upstream could gain valuable insights from reading it. I also posted
there my notes on the exact steps I ran on stadler to build MariaDB
and run mariadb-test-run in case someone wants to repeat it. I can
also post the same thing here if you think it is valuable for Debian
bug tracking.



Bug#1104002: mariadb: FTBFS on sparc64: new LLM embedding / vector functions crash on sparc64

2025-11-26 Thread Otto Kekäläinen
X-Debbugs-CC: [email protected]
User: [email protected]
Usertags: sparc64

Looking at the most recent MariaDB builds on at
https://buildd.debian.org/status/package.php?p=mariadb seems x32 is
failing on these same vector functions.

In the sparc64 build they are skipped, but the post-build MTR test is
failing on other tests crashing:
...
main.delete_use_source_engines   w13 [ pass ]  32203
main.partition_innodbw3 [ pass ]  77352
main.update_use_source   w21 [ pass ]  52894
main.update_innodb   w30 [ pass ]  35943
main.parser_bug21114_innodb  w16 [ pass ]  133529
main.long_unique_innodb  w17 [ retry-fail ]
Test ended at 2025-11-08 21:44:11

CURRENT_TEST: main.long_unique_innodb
mysqltest: At line 165: query 'update ignore t1 set f = 'x'' failed
with wrong errno  (2013): 'Lost connection to server during
query', instead of ER_NOT_SUPPORTED_YET (1235)...
...
Completed: Failed 6/1135 tests, 99.47% were successful.
Failing test(s): main.long_unique_innodb main.alter_table_lock
main.unsafe_binlog_innodb
...

This bug report https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104002
has steps to reproduce how I tested sparc64 builds and MTR on
stadler.debian.net myself in case somebody has bandwidth to try to
make MariaDB run well on sparc64.



Bug#1104002: mariadb: FTBFS on sparc64: new LLM embedding / vector functions crash on sparc64

2025-04-26 Thread Otto Kekäläinen
Maybe the USR1 signal is from the test runner that kills the process after
it has been stuck for a long time.

I observed the first tests run quickly but then it grinds to a halt. But it
could just be because of hitting the bug that traps stops/crashes server
without returning control.

I didn't install all symbols as I don't have the skills to read the full
code path / stack trace and write a patch for sparc64 compatibility myself.

I am not debugging this further right now, need to debug upgrade issues
that affects all platforms.


Bug#1104002: mariadb: FTBFS on sparc64: new LLM embedding / vector functions crash on sparc64

2025-04-26 Thread Gregor Riepl

Core was generated by
`/build/reproducible-path/mariadb-11.8.1/builddir/sql/mariadbd
--defaults-group-suffix=.1
--defaults-file=/build/reproducible-path/mariadb-11.8.1/builddir/mysql-test/var/3/my.cnf
--log-output=file --core-file --loose-debug-sync-timeout=300'.
Program terminated with signal SIGUSR1, User defined signal 1.
#0  0xfff8000102b951b0 in ?? () from /lib/sparc64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0xfff800010d9428e0 (LWP 2910561))]
#0  0xfff8000102b951b0 in ?? () from /lib/sparc64-linux-gnu/libc.so.6
#1  0x01aef804 in handle_fatal_signal (sig=10) at
./sql/signal_handler.cc:298
#2  
#3  0x7265636f76657261 in ?? ()


I think you're missing libc6-dbg (and possibly some other debugging symbols).

I'm also surprised by the SIGUSR1. Is this really a crash?



Bug#1104002: mariadb: FTBFS on sparc64: new LLM embedding / vector functions crash on sparc64

2025-04-25 Thread Otto Kekäläinen
Hi!

Yes, I tried the build on the porterbox and it is crashing, and
repeated runs seem to crash on the same tests, but not exactly every
time.

Completed: Failed 14/1131 tests, 98.76% were successful.
Failing test(s): main.bind_address_resolution
main.bind_multiple_addresses_resolution
main.log_slow_always_query_time main.connect main.set_statement
main.unsafe_binlog_innodb

Completed: Failed 10/1130 tests, 99.12% were successful.
Failing test(s): main.bind_multiple_addresses_resolution
main.bind_address_resolution main.partition_innodb_semi_consistent
main.subselect_no_mat main.sp-i_s_columns


-
Notes to self on how to build on stadler.debian.net:

ssh ssh stadler.debian.net
source porterbox.sh
pend # remove any old schroots
psetup # create a new schroot
pinstall git-buildpackage gdb debian-goodies libipc-system-simple-perl
# install prerequisites
pchroot # enter schroot
gbp clone --verbose vcs-git:mariadb # gbp not on host, must run from
inside schroot
exit
pdeps mariadb # install build dependencies
pchroot # enter again to start actual build
cd mariadb
gbp buildpackage --git-verbose --git-no-pristine-tar  -us -uc

# rerun test after build when artifacts are still around using
# commands copied from buildlog
cd builddir/mysql-test
./mtr --force --testcase-timeout=120 --suite-timeout=540 --retry=3 \
  --verbose-restart --max-save-core=1 --max-save-datadir=1 \
  --parallel=48 --skip-rpl --suite=main \
  --skip-test-list=/tmp/tmp.MTnxYcCghC



Bug#1104002: mariadb: FTBFS on sparc64: new LLM embedding / vector functions crash on sparc64

2025-04-24 Thread John Paul Adrian Glaubitz
Hi Otto,

On Thu, 2025-04-24 at 07:29 -0700, Otto Kekäläinen wrote:
> Thanks for the tips, I relayed suggestion about GCC Compile Farm to upstream.
> 
> Also I noticed that the sparc64 crashes are pretty random, happening
> in different tests. Maybe the runner is unstable?
> 
> For example in 
> https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=sparc64&ver=1%3A11.8.1-4&stamp=1745481491&raw=0
> the failing tests were:
> 
> main.status main.vector2 main.query_cache
> main.set_statement_notembedded main.ssl_timeout main.log_state
> main.userstat main.backup_lock_binlog main.lock_multi main.xa
> 
> In 
> https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=sparc64&ver=1%3A11.8.1-4&stamp=1745449363&raw=0
> the failing tests were:
> 
> main.vector2 main.concurrent_innodb_safelog main.sp-innodb main.check

Did you try reproducing the issue on stadler.debian.net? If it's an issue
with the host machine, you should be able to run the tests successfully
on the porterbox.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913



Bug#1104002: mariadb: FTBFS on sparc64: new LLM embedding / vector functions crash on sparc64

2025-04-24 Thread Otto Kekäläinen
Thanks for the tips, I relayed suggestion about GCC Compile Farm to upstream.

Also I noticed that the sparc64 crashes are pretty random, happening
in different tests. Maybe the runner is unstable?

For example in 
https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=sparc64&ver=1%3A11.8.1-4&stamp=1745481491&raw=0
the failing tests were:

main.status main.vector2 main.query_cache
main.set_statement_notembedded main.ssl_timeout main.log_state
main.userstat main.backup_lock_binlog main.lock_multi main.xa

In 
https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=sparc64&ver=1%3A11.8.1-4&stamp=1745449363&raw=0
the failing tests were:

main.vector2 main.concurrent_innodb_safelog main.sp-innodb main.check



Bug#1104002: mariadb: FTBFS on sparc64: new LLM embedding / vector functions crash on sparc64

2025-04-23 Thread John Paul Adrian Glaubitz
Hi Otto,

On Wed, 2025-04-23 at 13:32 -0700, Otto Kekäläinen wrote:
> I reported this to upstream, who is willing to look into it, but have
> no sparc64 access.
> 
> Thus I was hoping some of the Debian sparc64 porters could step in and
> test a bit, and help upstream, as they seem responsive to fixing
> sparc64.

I don't have the time to go through this whole bug report now, but FYI
there is a very fast sparc64 (SPARC M8 clocked at 5 GHz) machine available
running Solaris 11.4 in the GCC Compile Farm for which any open source
developer can request an account for, see:

- https://gcc.gnu.org/wiki/CompileFarm
- https://portal.cfarm.net/machines/list/

We also have a sparc64 machine there running Debian unstable, but it's
currently offline due to CPU problems. However, we have now finally
purchased replacement CPUs and we are confident to get the machine back
online soon.

If upstream is really willing to work on sparc64 and they want to get the
bug fixed as soon as possible, there would also be the possibility to create
a DSA guest account to gain access to stadler.debian.net which is a SPARC T4
running Debian unstable.

See: https://dsa.debian.org/doc/guest-account/

Since you're a DD, you can request a DSA guest account for them. They will
need to specify what architectures they need access to. Access to other
architectures is available as well, also through the GCC Compile Farm.

I highly recommend the MariaDB people to request a GCC Compile Farm account
if they haven't done so yet. I assume they would be interested to test their
software on the various architectures offered there as well. And maybe they
can also just use the Solaris 11.4 SPARC M8 to fix the sparc64 crashes.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913



Bug#1104002: mariadb: FTBFS on sparc64: new LLM embedding / vector functions crash on sparc64

2025-04-23 Thread Otto Kekäläinen
Package: mariadb
Version: 1:11.8.1-1
Tags: ftbfs
X-Debbugs-CC: [email protected]
User: [email protected]
Usertags: sparc64
Forwarded: https://jira.mariadb.org/browse/MDEV-36670

**REQUESTING HELP FROM SPARC64 PORTERS IN DEBIAN**

In a build of MariaDB 11.8.1 on official Debian builders for sparc64
multiple vector related tests crashed the server:

* main.mariadb-import
* main.vector
* main.vector_aria
* main.vector_funcs
* main.vector_innodb
* main.vector2_notembedded

These are all related to the new AI embeddings functionality in
MariaDB (https://mariadb.com/kb/en/vectors/), and is new code that has
never been tested on sparc64 before.

I reported this to upstream, who is willing to look into it, but have
no sparc64 access.

Thus I was hoping some of the Debian sparc64 porters could step in and
test a bit, and help upstream, as they seem responsive to fixing
sparc64.


Full log at 
https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=sparc64&ver=1%3A11.8.1-3&stamp=1745371074&raw=0

Example error:

main.mariadb-import  w13 [ fail ]
Test ended at 2025-04-23 01:10:56

CURRENT_TEST: main.mariadb-import
mysqltest: At line 112: query 'insert vec(v) values
(x'e360d63ebe554f3fcdbc523f4522193f5236083d'),
(x'f511303f72224a3fdd05fe3eb22a133ffae86a3f'),
(x'f09baa3ea172763f123def3e0c7fe53e288bf33e'),
(x'b97a523f2a193e3eb4f62e3f2d23583e9dd60d3f'),
(x'f7c5df3e984b2b3e65e59d3d7376db3eac63773e'),
(x'de01453ffa486d3f10aa4d3fdd66813c71cb163f'),
(x'76edfc3e4b57243f10f8423fb158713f020bda3e'),
(x'56926c3fdf098d3e2c8c5e3d1ad4953daa9d0b3e'),
(x'7b713f3e5258323f80d1113d673b2b3f66e3583f'),
(x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e')' failed: 
(2013): Lost connection to server during query

main.vector2_notembedded w2 [ fail ]
Test ended at 2025-04-23 01:13:42

CURRENT_TEST: main.vector2_notembedded
mysqltest: At line 36: query 'insert into t1 select 0x as v
from seq_1_to_1000' failed:  (2013): Lost connection to
server during query

main.vector_innodb   w2 [ fail ]
Test ended at 2025-04-23 01:14:47

CURRENT_TEST: main.vector_innodb
mysqltest: At line 7: query 'insert t1 (v) values
(x'106d263fdf68ba3eb08d533f97d46e3fd1e1ec3edc4c123f984c563f621a233f'),
(x'd55bee3c56eb9e3e84e3093f838dce3eb7cd653fe32d7d3f12de133c5715d23e'),
(x'fcd5553f3822443f5dae413f2593493f363f5f7f113ebf12373d4d145a3f'),
(x'7493093fd9a27d3e9b13783f8c66653f0bd7d23e50db983d251b013f1dba133f'),
(x'2e30373fae331a3eba94153ee32bce3e3311b33d5bc75d3f6c25653eb769113f'),
(x'381d5f3f2781de3e4f011f3f9353483f9bb37e3edd622d3eabecb63ec246953e'),
(x'4ee5dc3e214b103f0e7e583f5f36473e79d7823ea872ec3e3ab2913d1b84433f'),
(x'8826243f7d20f03e5135593f83ba653e44572d3fa87e8e3e943e0e3f649a293f'),
(x'3859ac3e7d21823ed3f5753fc79c143e61d39c3cee39ba3eb0b0133e815c173f'),
(x'cff0d93c32941e3f64b22a3f1e4f083f4ea2563fbff4a63e12a4703f6c824b3f')'
failed:  (2013): Lost connection to server during query

main.vector_aria w3 [ fail ]
Test ended at 2025-04-23 01:16:00

CURRENT_TEST: main.vector_aria
mysqltest: At line 6: query 'alter table t add vector(f)' failed:
 (2013): Lost connection to server during query

main.vector_funcsw3 [ fail ]
Test ended at 2025-04-23 01:17:15

CURRENT_TEST: main.vector_funcs
mysqltest: At line 3: query 'insert t1 (v) values
(x'e360d63ebe554f3fcdbc523f4522193f5236083d'),
(x'f511303f72224a3fdd05fe3eb22a133ffae86a3f'),
(x'f09baa3ea172763f123def3e0c7fe53e288bf33e'),
(x'b97a523f2a193e3eb4f62e3f2d23583e9dd60d3f'),
(x'f7c5df3e984b2b3e65e59d3d7376db3eac63773e'),
(x'de01453ffa486d3f10aa4d3fdd66813c71cb163f'),
(x'76edfc3e4b57243f10f8423fb158713f020bda3e'),
(x'56926c3fdf098d3e2c8c5e3d1ad4953daa9d0b3e'),
(x'7b713f3e5258323f80d1113d673b2b3f66e3583f'),
(x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e')' failed: 
(2013): Lost connection to server during query

The result from queries just before the failure was:
create table t1 (id int auto_increment primary key, v vector(5) not
null, vector index (v));
insert t1 (v) values (x'e360d63ebe554f3fcdbc523f4522193f5236083d'),
(x'f511303f72224a3fdd05fe3eb22a133ffae86a3f'),
(x'f09baa3ea172763f123def3e0c7fe53e288bf33e'),
(x'b97a523f2a193e3eb4f62e3f2d23583e9dd60d3f'),
(x'f7c5df3e984b2b3e65e59d3d7376db3eac63773e'),
(x'de01453ffa486d3f10aa4d3fdd66813c71cb163f'),
(x'76edfc3e4b57243f10f8423fb158713f020bda3e'),
(x'56926c3fdf098d3e2c8c5e3d1ad4953daa9d0b3e'),
(x'7b713f3e5258323f80d1113d673b2b3f66e3583f'),
(x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e');

 - found 'core.2910524' (1/1)
Core generated by
'/build/reproducible-path/mariadb-11.8.1/builddir/sql/mariadbd'
Output from gdb follows. The first stack trace is from the failing thread.
The following stack traces are from all threads (so the failing one is
duplicated).
--
[New LWP 2910561]
[New LWP 2910550]
[New LWP 2910552]
[New LWP 2910555]
[New LWP 2910524]
[New LWP 2910553]

This GDB supports auto-do