We have been very successfully running MySQL in a production
environment one way or another for the past 6 years.  We have recently
run into what I believe is a thread race condition while writing then
reading from a MyISAM table.  The server we are experiencing this
problem on is a very stable environment, very rarely has anything
changed on this machine in the past 4 years (other than mysql upgrades,
and security updates). The machine is running Debian Woody (stable), I
have included the libraries from mysqlbug and kernel version at the end
of this email.  The machine is a Dell Poweredge 6450 4 processor XEON
700/2MB, running local hardware raid with an LSI controller.  I include
the hardware configuration because it's possible the thread interaction
problem may lay closer to the hardware level (or compiler) since the box
is a 4 CPU machine with fairly massive L2 caches (even by today's
standards) on each chip that need to be kept synchronized.  There have
been firmware upgrades for this machine in the past to fix 'cpu
synchronization' issues, however I do not know if we are running these
fixes or not.  We are running MySQL 4.0.23 Mysql-binary  (so it should
be statically linked anyway).  I've looked in the changelog for 4.0.24
and 4.0.25 and I don't see any updates that might resolve our issue.

        Unfortunately I have no solid test case for this issue and it only
occurs under times of heavy stress.  The problem has manifested itself
twice out of the past two times a customer has been doing 'massive'
batch configuration changes to our system.  They do these changes once a
month, and the past two months this issue has occurred.  The issue
manifests itself in a very particular way, and has been practically
exactly the same both times.  I am hoping somebody can give me a
direction to take this, either to open a MySQL support case,
mysqldumping the table and reinserting the table, looking at the
firmware updates...  

        The problem is we get an error 127 from table handler error when doing
a select:

General error,  message from server: "Got error 127 from table handler"
-- SQLQuery was:select  DISTINCT
Activity.activityID,Activity.processID,Activity.activityName, [....]
from Activity where ( Activity.processID='147008' AND
Activity.activityName='VIMforQuiz' )

This, ofcourse, crashes out our XML processing and the worker thread
will end up stopping.  What is happening when this happens is there are
atleast 3-4 active threads running on our application server processing
tens of thousands of XML files, and making major changes to the
database.  Each thread is essentially performing the same duty, but just
with different data.  These threads are primarily database bound. 
Replication is active on this server, and there are 3 replication
servers pulling the updates from this server.  None of their tables are
corrupted by this (by virtue of check table).  Looking in the MySQL
binlog for the timeperiod when this occurs (it, ofcourse, does not
include selects), shows about 200-300 inserts/updates going on during
the 1 second period when this issue happens.  There happens to be 3
separate threads inserting records into the "Activity" table during that
second (they must just happened to have converged).  I am guessing that
the record the above select query is trying to query, has just been
inserted (I found the insert for the record during that same 1 second as
the crash).  

        Performing a check table reports everything is kosher:

mysql> check table Activity;
+---------------------+-------+----------+----------+
| Table               | Op    | Msg_type | Msg_text |
+---------------------+-------+----------+----------+
| abc.Activity        | check | status   | OK       |
+---------------------+-------+----------+----------+
1 row in set (5 min 36.80 sec)

Last month I did a repair table aswell, and that reported everything was
fine aswell, but obviously didn't fix anything.

Once this crash occurs, that worker thread will quit... but all other
threads will continue to pound away on this and other tables, and will
work perfectly fine with no errors.  The table is a MyISAM Dynamic
table, as shown from this show table status:

| Activity                                         | MyISAM | Dynamic   
|  4681274 |            105 |   505218432 |      4294967295 |   
166466560 |  10512548 |        4960114 | 2004-10-14 14:04:25 |
2005-05-24 00:57:34 | 2005-05-24 00:57:10 |                |         |

The free space probably was more like zero when the incident occurred,
all of the commands I have run here were several hours after the
incident.

perror 127 says that the record file is crashed.  What will cause MySQL
to return such an error during a lookup? Clearly the table *itself* is
not marked as crashed, so I suspect what must be happening is the SELECT
statement is being allowed to read from the table while an insert is
still writing... It presumably should be locked out during that
operation, and the select is seeing a partially written table.  

Any ideas?  Mysqldump the table and reinsert it?  Will that do anything
that repair table doesn't do?  What about adding an index on that lookup
key?  (processID, activityName).  Both times it's crashed it's been on a
lookup on those two values.  I don't need one, but might that add an
extra layer of locking to update the indexes and might avoid this
problem?  There are several other indexes on the table already
(including a partial index on processID):

mysql> show index from Activity;
+----------+------------+----------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
| Table    | Non_unique | Key_name | Seq_in_index | Column_name     |
Collation | Cardinality | Sub_part | Packed | Null | Index_type |
Comment |
+----------+------------+----------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
| Activity |          0 | PRIMARY  |            1 | activityID      |
A         |     4738990 |     NULL | NULL   |      | BTREE     
|         |
| Activity |          1 | index1   |            1 | processID       |
A         |      143605 |     NULL | NULL   |      | BTREE     
|         |
| Activity |          1 | index1   |            2 | isActiveFlag    |
A         |      430817 |     NULL | NULL   | YES  | BTREE     
|         |
| Activity |          1 | index2   |            1 | processID       |
A         |      143605 |     NULL | NULL   |      | BTREE     
|         |
| Activity |          1 | index2   |            2 | storableClassID |
A         |     4738990 |     NULL | NULL   |      | BTREE     
|         |
+----------+------------+----------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
5 rows in set (0.00 sec)

Any help or direction to bring this would be appreciated (even if it's,
hey! You sound like you've got a problem, open a support case)

Thanks,
-Joe

Server version          4.0.23-standard-log
Protocol version        10
Connection              Localhost via UNIX socket
UNIX socket             /var/run/mysqld/mysqld.sock
Uptime:                 16 days 14 hours 28 min 11 sec

Threads: 492  Questions: 525467812  Slow queries: 1108  Opens: 110664 
Flush tables: 1  Open tables: 1024  Queries per second avg: 366.310
>C compiler:    2.95.4
>C++ compiler:  2.95.4
>Environment:
        <machine, os, target, libraries (multiple lines)>
System: Linux db-01 2.4.26-athenium-pe6450-3 #1 SMP Thu Jul 29 19:55:41
EDT 2004 i686 unknown
Architecture: i686

Some paths:  /usr/bin/perl /usr/bin/make /usr/bin/gcc /usr/bin/cc
GCC: Reading specs from /usr/lib/gcc-lib/i386-linux/2.95.4/specs
gcc version 2.95.4 20011002 (Debian prerelease)
Compilation info: CC='gcc'  CFLAGS=''  CXX='g++'  CXXFLAGS='' 
LDFLAGS=''  ASFLAGS=''
LIBC: 
lrwxrwxrwx    1 root     root           13 Sep  9  2003 /lib/libc.so.6
-> libc-2.2.5.so
-rwxr-xr-x    1 root     root      1153784 Apr  8  2003
/lib/libc-2.2.5.so
-rw-r--r--    1 root     root      2391002 Apr  8  2003 /usr/lib/libc.a
-rw-r--r--    1 root     root          178 Apr  8  2003 /usr/lib/libc.so
Configure command: ./configure '--prefix=/usr' '--exec-prefix=/usr'
'--libexecdir=/usr/sbin' '--datadir=/usr/share'
'--sysconfdir=/etc/mysql' '--localstatedir=/var/lib/mysql'
'--includedir=/usr/include' '--infodir=/usr/share/info'
'--mandir=/usr/share/man' '--enable-shared' '--enable-static'
'--enable-thread-safe-client' '--enable-assembler'
'--enable-local-infile' '--with-raid'
'--with-unix-socket-path=/var/run/mysqld/mysqld.sock'
'--with-mysqld-user=mysql' '--with-libwrap'
'--with-client-ldflags=-lstdc++' '--with-embedded-server' '--with-vio'
'--with-openssl' '--without-docs' '--without-bench' '--without-readline'
'--with-extra-charsets=all' '--with-berkeley-db' '--with-innodb'






-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to