We are replicating our database server in the interests of having a hot spare with a fully up to date dataset.
Every so often, I'll get a page from a watchdog script[1] telling me that the slave has fallen out of sync. Perhaps every 10 days this occurs. I notice there's always a coincidence of the slave disconnecting and reconnecting to the master a few minutes before this occurs: 031025 6:24:06 Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013) 031025 6:24:06 Slave: Failed reading log event, reconnecting to retry, log 'dbms2-bin.318' position 26879196 031025 6:24:06 Slave: reconnected to master '[EMAIL PROTECTED]:3306',replication resumed in log 'dbms2-bin.318' at position 26879196 ERROR: 1062 Duplicate entry '3133173' for key 2 031025 9:32:35 Slave: error running query [**insert that failed goes here**] 031025 9:32:35 Error running query, slave aborted. Fix the problem, and re-start the slave thread with "mysqladmin start-slave". We stopped at log 'dbms2-bin.319' position 68555844 031025 9:32:35 Slave thread exiting, replication stopped in log 'dbms2-bin.319' at position 68555844 Is the error disconnect/reconnect not sync-safe? [1] The watchdog script checks to make sure the that a frequently updated table has a row with timestamp younger than 5 minutes on the slave. -- Michael Bacarella 24/7 phone: 1-646-641-8662 Netgraft Corporation http://netgraft.com/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]