------------------------------------------------------------ revno: 624 revision-id: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Andrew Tridgell <[EMAIL PROTECTED]> branch nick: tridge.test2 timestamp: Fri 2007-09-14 09:49:12 +1000 message: - merge from ronnie - add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring modified: common/system_linux.c system.c-20070525071636-a5n1ihghjtppy08r-3 server/ctdb_recoverd.c recoverd.c-20070503213540-bvxuyd9jm1f7ig90-1 web/configuring.html configuring_ctdb.htm-20070608021649-cipqdfao7xedp6ji-1 web/nfs.html nfs.html-20070608234340-a8i1dxro7a7i6jz6-1 ------------------------------------------------------------ revno: 432.1.281 revision-id: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Fri 2007-09-14 08:56:27 +1000 message: update the section about event scripts modified: web/configuring.html configuring_ctdb.htm-20070608021649-cipqdfao7xedp6ji-1 ------------------------------------------------------------ revno: 432.1.280 revision-id: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Fri 2007-09-14 08:15:24 +1000 message: disable nfsv4 in etc/sysconfig/nfs modified: web/nfs.html nfs.html-20070608234340-a8i1dxro7a7i6jz6-1 ------------------------------------------------------------ revno: 432.1.279 revision-id: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Thu 2007-09-13 14:51:37 +1000 message: when a ctdb_takeover_run has failed we must make sure that need_takeover_run is set to true or else we might forget to rerun it again during the next recovery othervise, need_takeover_run is only set to true IFF the node flags for a remote node and the local nodes differ. It is possible that a takeover run fails and thus the reassignment of ip addresses is incomplete but before we get back to the test in monitor_cluster() that all the node flags of all nodes have converged and they now match each others again. and thus causing monitor_cluster() to fail to realize that a takeover run is needed. modified: server/ctdb_recoverd.c recoverd.c-20070503213540-bvxuyd9jm1f7ig90-1 ------------------------------------------------------------ revno: 432.1.278 revision-id: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Thu 2007-09-13 14:28:18 +1000 message: merge from tridge modified: common/system_aix.c system_aix.c-20070714002637-rpu7y2dxeoh1ckej-1 common/system_linux.c system.c-20070525071636-a5n1ihghjtppy08r-3 config/ctdb.init ctdb.init-20070527204758-biuh7znabuwan3zn-6 config/events.d/10.interface 10.interface-20070604050809-s21zslfirn07zjt8-1 config/events.d/60.nfs nfs-20070601141008-hy3h4qgbk1jd2jci-1 config/functions functions-20070601105405-gajwirydr5a9zd6x-1 include/ctdb_private.h ctdb_private.h-20061117234101-o3qt14umlg9en8z0-13 server/ctdb_daemon.c ctdb_daemon.c-20070409200331-3el1kqgdb9m4ib0g-1 server/ctdb_recoverd.c recoverd.c-20070503213540-bvxuyd9jm1f7ig90-1 server/ctdb_takeover.c ctdb_takeover.c-20070525071636-a5n1ihghjtppy08r-2 server/ctdbd.c ctdbd.c-20070411085044-dqmhr6mfeexnyt4m-1 tools/ctdb.c ctdb_control.c-20070426122705-9ehj1l5lu2gn9kuj-1 tools/ctdb_diagnostics ctdb_diagnostics-20070905041904-9d9r1qnt1j9qiwiz-1 ------------------------------------------------------------ revno: 432.1.277 revision-id: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Wed 2007-09-12 07:28:24 +1000 message: use the public addresses variable instead of hardcoding the path modified: config/events.d/10.interface 10.interface-20070604050809-s21zslfirn07zjt8-1 ------------------------------------------------------------ revno: 432.1.276 revision-id: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Wed 2007-09-12 07:26:30 +1000 message: move all ip addresses onto loopback when we startup ctdb modified: config/events.d/10.interface 10.interface-20070604050809-s21zslfirn07zjt8-1 ------------------------------------------------------------ revno: 432.1.275 revision-id: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Mon 2007-09-10 16:34:11 +1000 message: grab the interface name from tok and not from the uninitialized array modified: server/ctdb_takeover.c ctdb_takeover.c-20070525071636-a5n1ihghjtppy08r-2 ------------------------------------------------------------ revno: 432.1.274 revision-id: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Mon 2007-09-10 16:23:06 +1000 message: merged patch from tridge modified: client/ctdb_client.c ctdb_client.c-20070411010216-3kd8v37k61steeya-1 common/ctdb_util.c ctdb_util.c-20061128065342-to93h6eejj5kon81-3 common/system_aix.c system_aix.c-20070714002637-rpu7y2dxeoh1ckej-1 common/system_linux.c system.c-20070525071636-a5n1ihghjtppy08r-3 config/ctdb.init ctdb.init-20070527204758-biuh7znabuwan3zn-6 config/ctdb.sysconfig ctdb.sysconfig-20070527204758-biuh7znabuwan3zn-7 include/ctdb_private.h ctdb_private.h-20061117234101-o3qt14umlg9en8z0-13 server/ctdb_serverids.c ctdb_serverids.c-20070824054041-oco3oebinbft02fl-1 server/ctdb_takeover.c ctdb_takeover.c-20070525071636-a5n1ihghjtppy08r-2 tools/ctdb_diagnostics ctdb_diagnostics-20070905041904-9d9r1qnt1j9qiwiz-1 === modified file 'common/system_linux.c' --- a/common/system_linux.c 2007-09-13 00:45:06 +0000 +++ b/common/system_linux.c 2007-09-13 04:28:18 +0000 @@ -255,6 +255,7 @@ return false; } ret = bind(s, (struct sockaddr *)&ip, sizeof(ip)); + close(s); return ret == 0; }
=== modified file 'server/ctdb_recoverd.c' --- a/server/ctdb_recoverd.c 2007-09-13 04:08:18 +0000 +++ b/server/ctdb_recoverd.c 2007-09-13 23:49:12 +0000 @@ -43,6 +43,7 @@ struct ban_state **banned_nodes; struct timeval priority_time; bool need_takeover_run; + bool need_recovery; }; #define CONTROL_TIMEOUT() timeval_current_ofs(ctdb->tunable.recover_timeout, 0) @@ -731,6 +732,9 @@ uint32_t generation; struct ctdb_dbid_map *dbmap; + /* if recovery fails, force it again */ + rec->need_recovery = true; + if (rec->last_culprit != culprit || timeval_elapsed(&rec->first_recover_time) > ctdb->tunable.recovery_grace_period) { /* either a new node is the culprit, or we've decide to forgive them */ @@ -928,6 +932,8 @@ DEBUG(0, (__location__ " Recovery complete\n")); + rec->need_recovery = false; + /* We just finished a recovery successfully. We now wait for rerecovery_timeout before we allow another recovery to take place. @@ -1576,6 +1582,12 @@ } + if (rec->need_recovery) { + /* a previous recovery didn't finish */ + do_recovery(rec, mem_ctx, pnn, num_active, nodemap, vnnmap, nodemap->nodes[j].pnn); + goto again; + } + /* verify that all active nodes are in normal mode and not in recovery mode */ === modified file 'web/configuring.html' --- a/web/configuring.html 2007-09-03 23:50:07 +0000 +++ b/web/configuring.html 2007-09-13 22:56:27 +0000 @@ -142,7 +142,10 @@ Please see the service scripts that installed by ctdb in /etc/ctdb/events.d for examples of how to configure other services to -be aware of the HA features of CTDB. +be aware of the HA features of CTDB.<p> + +Also see /etc/ctdb/events.d/README for additional documentation on how to +create and manage event scripts. <h2>TCP port to use for CTDB</h2> === modified file 'web/nfs.html' --- a/web/nfs.html 2007-09-07 02:20:48 +0000 +++ b/web/nfs.html 2007-09-13 22:15:24 +0000 @@ -50,6 +50,8 @@ LOCKD_TCPPORT=599 LOCKD_UDPPORT=599 STATD_HOSTNAME="$NFS_HOSTNAME -H /etc/ctdb/statd-callout -p 97" + RPCNFSDARGS="-N 4" + </pre> The CTDB_MANAGES_NFS line tells the events scripts that CTDB is to manage startup and shutdown of the NFS and NFSLOCK services.<br> @@ -79,6 +81,7 @@ NFS_HOSTNAME is the dns name for the ctdb cluster and which is used when clients map nfs shares. This name must be in DNS and resolve back into the public ip addresses of the cluster.<br> Always use the same name here as you use for the samba hostname. +RPCNFSDARGS is used to disable support for NFSv4 which is not yet supported by CTDB. <h2>chkconfig</h2>