Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-01 Thread Sergei Golubchik
Hi, Seppo, Jan! Note, this is 10.2 patch below. > commit 4b164f176e6 > Author: Seppo Jaakola > Date: Wed Sep 15 09:16:44 2021 +0300 > > MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL) I think this should say MDEV-23328 Server hang due to Galera lock conflict resolution it'

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-03 Thread Jan Lindström
Hi Sergei, On Fri, Oct 1, 2021 at 9:05 PM Sergei Golubchik wrote: > Hi, Seppo, Jan! > > Note, this is 10.2 patch below. > > > > > MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL) > > I think this should say > > MDEV-23328 Server hang due to Galera lock conflict resolution > > Sur

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-06 Thread Sergei Golubchik
Hi, Jan! On Oct 04, Jan Lindström wrote: > Hi Sergei, > > > +/* This is wrapper for wsrep_break_lock in thr_lock.c */ > > > +static int wsrep_thr_abort_thd(void *bf_thd_ptr, void *victim_thd_ptr, > > > my_bool signal) > > > +{ > > > + THD* victim_thd= (THD *) victim_thd_ptr; > > > + /* We need

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-06 Thread Jan Lindström
Hi Sergei, Answers below: > > > > > +/* This is wrapper for wsrep_break_lock in thr_lock.c */ > > > > +static int wsrep_thr_abort_thd(void *bf_thd_ptr, void > *victim_thd_ptr, my_bool signal) > > > > +{ > > > > + THD* victim_thd= (THD *) victim_thd_ptr; > > > > + /* We need to lock THD::LOCK_th

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-06 Thread Sergei Golubchik
Hi, Jan! On Oct 06, Jan Lindström wrote: > > > > > > > +/* This is wrapper for wsrep_break_lock in thr_lock.c */ > > > > > +static int wsrep_thr_abort_thd(void *bf_thd_ptr, void > > > > > *victim_thd_ptr, my_bool signal) > > > > > +{ > > > > > + THD* victim_thd= (THD *) victim_thd_ptr; > > > > >

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-06 Thread Jan Lindström
Hi Sergei, Answers to your questions below: On Wed, Oct 6, 2021 at 5:03 PM Sergei Golubchik wrote: > Hi, Jan! > > On Oct 06, Jan Lindström wrote: > > > > > > > > > +/* This is wrapper for wsrep_break_lock in thr_lock.c */ > > > > > > +static int wsrep_thr_abort_thd(void *bf_thd_ptr, void > *vic

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-08 Thread Sergei Golubchik
Hi, Jan! On Oct 06, Jan Lindström wrote: > > > > > > > > > > I must say the thr_lock code is not familiar to me but there > > > > > are mysql_mutex_lock() calls to lock->mutex. After code review > > > > > it is not clear to me what that mutex is. > > > > > > this is for table locks. `lock` is `dat

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-10 Thread Jan Lindström
Hi Sergei, > > > if (victim_trx) { > > const trx_id_t victim_trx_id= victim_trx->id; > > const longlong victim_thread= thd_get_thread_id(victim_thd); > > /* This is necessary as correct mutexing order is > > lock_sys -> trx -> THD::LOCK_thd_data and below > > function assume

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-10 Thread Jan Lindström
Hi Sergei, Update on what happens after TOI failure. > What I mean it, what if KILL would ignore WSREP_TO_ISOLATION_BEGIN > failure and will just proceed killing? Perhaps if > WSREP_TO_ISOLATION_BEGIN fails it means that there can be no bf aborts > anyway? Could you try to find it out? > After

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-11 Thread Jan Lindström
Update on disconnect > > > // As trx is now referenced it can't go away > > Hmm. What happens if the thd that owns this transaction is killed or the > user disconnects? THD gets freed. What happens to the referenced trx? > I created new mtr-tests (galera_disconnect_debug) to try disconnecti

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-11 Thread Jan Lindström
Hi Sergei, After QA runs done by Ramesh, we now know the latest fix candidate i.e. what is in bb-10.2-MDEV-25114-galera-v2 is incorrect. Problem is in wsrep_close_connections() as it holds LOCK_thread_count while it does abort_replicated that will call wsrep_abort_transaction and there we use find

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-11 Thread Sergei Golubchik
Hi, Jan! On Oct 10, Jan Lindström wrote: > Hi Sergei, > > > > > if (victim_trx) { > > > const trx_id_t victim_trx_id= victim_trx->id; > > > const longlong victim_thread= thd_get_thread_id(victim_thd); > > > /* This is necessary as correct mutexing order is > > > lock_sys -> trx -

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-11 Thread Sergei Golubchik
Hi, Jan! Great, thanks! On Oct 11, Jan Lindström wrote: > Update on disconnect > > > > > > // As trx is now referenced it can't go away > > > > Hmm. What happens if the thd that owns this transaction is killed or the > > user disconnects? THD gets freed. What happens to the referenced trx?

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-11 Thread Jan Lindström
Hi Sergei, > > > > trx_rw_is_active needs to be modified to do that, right? > > > > No this is current behaviour, I did not change anything on > > trx_rw_is_active > > In xtradb trx_rw_is_active returns bool. > I think xtradb is still the default innodb in 10.2. > > In innobase it returns, indeed

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-11 Thread Jan Lindström
Hi Sergei, Update on wsrep_close_connections problem. My suggestion to fix this issue is on https://github.com/MariaDB/server/commit/99cbe03a44cc95e6f548550df51e7201ebea3b9d If you have a better solution, please advise. R: Jan On Mon, Oct 11, 2021 at 12:52 PM Jan Lindström wrote: > Hi Sergei

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-14 Thread Sergei Golubchik
Hi, Jan! Here's an idea of the fix: Let's always use the KILL mutex locking order, that is victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex For this we need to fix wsrep_abort_transaction(), which is called from the server, and wsrep_innobase_kill_one_trx(), which is calle

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-14 Thread Jan Lindström
Hi, Few questions: (1) Is this review for a full patch or just problems on wsrep_abort_transaction ? (2) In case at wsrep_abort_transaction we do not have a transaction idea is that we do not anymore want to enter InnoDB i.e. innobase_kill_query, that is the reason we set MUST_ABORT to wsrep_conf

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-15 Thread Sergei Golubchik
Hi, Jan! On Oct 15, Jan Lindström wrote: > Few questions: > > (1) Is this review for a full patch or just problems on > wsrep_abort_transaction ? a full patch > (2) In case at wsrep_abort_transaction we do not have a transaction idea is > that we do not anymore want to enter InnoDB i.e. innobas

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-19 Thread Jan Lindström
Hi Sergei, I have implemented PlanE as agreed on branch bb-10.2-MDEV-25114-planE-galera and mostly regression testing looks promising. However, I have problems with MDL-locks. For example test case galera.galera_toi_lock_exclusive hangs and I have not yet found out why. I will ask help from Seppo.

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-20 Thread Jan Lindström
Hi Sergei, This does not seem to work. Consider following: CREATE TABLE t1 (id INT PRIMARY KEY) ENGINE=InnoDB; INSERT INTO t1 VALUES (1); connection node_2; SET AUTOCOMMIT=OFF; START TRANSACTION; INSERT INTO t1 VALUES (2); connection node_2a; ALTER TABLE t1 ADD COLUMN f2 INTEGER, LOCK=EXCLUSIVE;

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-21 Thread Jan Lindström
Hi Sergei, Your suggestion does not work. There are more than one problem (1) wsrep_abort_transaction does not release MDL-lock (2) innobase_kill_one_trx crashes at wsrep->abort_pre_commit() because transaction registered inside wsrep has disappeared (this does not happen if THD::LOCK_thd_data is