We have been very successfully running MySQL in a production environment one way or another for the past 6 years. We have recently run into what I believe is a thread race condition while writing then reading from a MyISAM table. The server we are experiencing this problem on is a very stable environment, very rarely has anything changed on this machine in the past 4 years (other than mysql upgrades, and security updates). The machine is running Debian Woody (stable), I have included the libraries from mysqlbug and kernel version at the end of this email. The machine is a Dell Poweredge 6450 4 processor XEON 700/2MB, running local hardware raid with an LSI controller. I include the hardware configuration because it's possible the thread interaction problem may lay closer to the hardware level (or compiler) since the box is a 4 CPU machine with fairly massive L2 caches (even by today's standards) on each chip that need to be kept synchronized. There have been firmware upgrades for this machine in the past to fix 'cpu synchronization' issues, however I do not know if we are running these fixes or not. We are running MySQL 4.0.23 Mysql-binary (so it should be statically linked anyway). I've looked in the changelog for 4.0.24 and 4.0.25 and I don't see any updates that might resolve our issue.
Unfortunately I have no solid test case for this issue and it only occurs under times of heavy stress. The problem has manifested itself twice out of the past two times a customer has been doing 'massive' batch configuration changes to our system. They do these changes once a month, and the past two months this issue has occurred. The issue manifests itself in a very particular way, and has been practically exactly the same both times. I am hoping somebody can give me a direction to take this, either to open a MySQL support case, mysqldumping the table and reinserting the table, looking at the firmware updates... The problem is we get an error 127 from table handler error when doing a select: General error, message from server: "Got error 127 from table handler" -- SQLQuery was:select DISTINCT Activity.activityID,Activity.processID,Activity.activityName, [....] from Activity where ( Activity.processID='147008' AND Activity.activityName='VIMforQuiz' ) This, ofcourse, crashes out our XML processing and the worker thread will end up stopping. What is happening when this happens is there are atleast 3-4 active threads running on our application server processing tens of thousands of XML files, and making major changes to the database. Each thread is essentially performing the same duty, but just with different data. These threads are primarily database bound. Replication is active on this server, and there are 3 replication servers pulling the updates from this server. None of their tables are corrupted by this (by virtue of check table). Looking in the MySQL binlog for the timeperiod when this occurs (it, ofcourse, does not include selects), shows about 200-300 inserts/updates going on during the 1 second period when this issue happens. There happens to be 3 separate threads inserting records into the "Activity" table during that second (they must just happened to have converged). I am guessing that the record the above select query is trying to query, has just been inserted (I found the insert for the record during that same 1 second as the crash). Performing a check table reports everything is kosher: mysql> check table Activity; +---------------------+-------+----------+----------+ | Table | Op | Msg_type | Msg_text | +---------------------+-------+----------+----------+ | abc.Activity | check | status | OK | +---------------------+-------+----------+----------+ 1 row in set (5 min 36.80 sec) Last month I did a repair table aswell, and that reported everything was fine aswell, but obviously didn't fix anything. Once this crash occurs, that worker thread will quit... but all other threads will continue to pound away on this and other tables, and will work perfectly fine with no errors. The table is a MyISAM Dynamic table, as shown from this show table status: | Activity | MyISAM | Dynamic | 4681274 | 105 | 505218432 | 4294967295 | 166466560 | 10512548 | 4960114 | 2004-10-14 14:04:25 | 2005-05-24 00:57:34 | 2005-05-24 00:57:10 | | | The free space probably was more like zero when the incident occurred, all of the commands I have run here were several hours after the incident. perror 127 says that the record file is crashed. What will cause MySQL to return such an error during a lookup? Clearly the table *itself* is not marked as crashed, so I suspect what must be happening is the SELECT statement is being allowed to read from the table while an insert is still writing... It presumably should be locked out during that operation, and the select is seeing a partially written table. Any ideas? Mysqldump the table and reinsert it? Will that do anything that repair table doesn't do? What about adding an index on that lookup key? (processID, activityName). Both times it's crashed it's been on a lookup on those two values. I don't need one, but might that add an extra layer of locking to update the indexes and might avoid this problem? There are several other indexes on the table already (including a partial index on processID): mysql> show index from Activity; +----------+------------+----------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +----------+------------+----------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+ | Activity | 0 | PRIMARY | 1 | activityID | A | 4738990 | NULL | NULL | | BTREE | | | Activity | 1 | index1 | 1 | processID | A | 143605 | NULL | NULL | | BTREE | | | Activity | 1 | index1 | 2 | isActiveFlag | A | 430817 | NULL | NULL | YES | BTREE | | | Activity | 1 | index2 | 1 | processID | A | 143605 | NULL | NULL | | BTREE | | | Activity | 1 | index2 | 2 | storableClassID | A | 4738990 | NULL | NULL | | BTREE | | +----------+------------+----------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+ 5 rows in set (0.00 sec) Any help or direction to bring this would be appreciated (even if it's, hey! You sound like you've got a problem, open a support case) Thanks, -Joe Server version 4.0.23-standard-log Protocol version 10 Connection Localhost via UNIX socket UNIX socket /var/run/mysqld/mysqld.sock Uptime: 16 days 14 hours 28 min 11 sec Threads: 492 Questions: 525467812 Slow queries: 1108 Opens: 110664 Flush tables: 1 Open tables: 1024 Queries per second avg: 366.310 >C compiler: 2.95.4 >C++ compiler: 2.95.4 >Environment: <machine, os, target, libraries (multiple lines)> System: Linux db-01 2.4.26-athenium-pe6450-3 #1 SMP Thu Jul 29 19:55:41 EDT 2004 i686 unknown Architecture: i686 Some paths: /usr/bin/perl /usr/bin/make /usr/bin/gcc /usr/bin/cc GCC: Reading specs from /usr/lib/gcc-lib/i386-linux/2.95.4/specs gcc version 2.95.4 20011002 (Debian prerelease) Compilation info: CC='gcc' CFLAGS='' CXX='g++' CXXFLAGS='' LDFLAGS='' ASFLAGS='' LIBC: lrwxrwxrwx 1 root root 13 Sep 9 2003 /lib/libc.so.6 -> libc-2.2.5.so -rwxr-xr-x 1 root root 1153784 Apr 8 2003 /lib/libc-2.2.5.so -rw-r--r-- 1 root root 2391002 Apr 8 2003 /usr/lib/libc.a -rw-r--r-- 1 root root 178 Apr 8 2003 /usr/lib/libc.so Configure command: ./configure '--prefix=/usr' '--exec-prefix=/usr' '--libexecdir=/usr/sbin' '--datadir=/usr/share' '--sysconfdir=/etc/mysql' '--localstatedir=/var/lib/mysql' '--includedir=/usr/include' '--infodir=/usr/share/info' '--mandir=/usr/share/man' '--enable-shared' '--enable-static' '--enable-thread-safe-client' '--enable-assembler' '--enable-local-infile' '--with-raid' '--with-unix-socket-path=/var/run/mysqld/mysqld.sock' '--with-mysqld-user=mysql' '--with-libwrap' '--with-client-ldflags=-lstdc++' '--with-embedded-server' '--with-vio' '--with-openssl' '--without-docs' '--without-bench' '--without-readline' '--with-extra-charsets=all' '--with-berkeley-db' '--with-innodb' -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]