------------------------------------------------------------ revno: 263 revision-id: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Andrew Tridgell <[EMAIL PROTECTED]> branch nick: tridge timestamp: Mon 2007-05-07 07:56:38 +1000 message: merged from ronnie modified: common/ctdb.c ctdb.c-20061127094323-t50f58d65iaao5of-2 common/ctdb_client.c ctdb_client.c-20070411010216-3kd8v37k61steeya-1 common/ctdb_control.c ctdb_control.c-20070426122724-j6gkpiofhbwdin63-1 direct/recoverd.c recoverd.c-20070503213540-bvxuyd9jm1f7ig90-1 include/ctdb.h ctdb.h-20061117234101-o3qt14umlg9en8z0-11 include/ctdb_private.h ctdb_private.h-20061117234101-o3qt14umlg9en8z0-13 tests/recover.sh recover.sh-20070502031230-tpuiet6m6tjdotta-1 tools/ctdb_control.c ctdb_control.c-20070426122705-9ehj1l5lu2gn9kuj-1 ------------------------------------------------------------ revno: 197.1.82 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Mon 2007-05-07 07:54:17 +1000 message: hang the timeout event off state and thus we dont need to explicitely free it and also we wont accidentally return from the function without killing the event first ------------------------------------------------------------ revno: 197.1.81 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Mon 2007-05-07 07:47:16 +1000 message: it now works to talloc_free() the timed event if we no longer want it to trigger this must have been a sideeffect of a different bug in the recoverd.c code that has now been fixed ------------------------------------------------------------ revno: 197.1.80 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Mon 2007-05-07 06:51:58 +1000 message: recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing ------------------------------------------------------------ revno: 197.1.79 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Mon 2007-05-07 05:02:48 +1000 message: add new controls to get and set the recovery master node of a daemon i.e. which node is "elected" to check for and drive recovery ------------------------------------------------------------ revno: 197.1.78 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Mon 2007-05-07 04:41:12 +1000 message: add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery ------------------------------------------------------------ revno: 197.1.77 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Sun 2007-05-06 12:46:56 +1000 message: update a comment to be more desciptive ------------------------------------------------------------ revno: 197.1.76 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Sun 2007-05-06 10:51:25 +1000 message: change a lot of printf into debug statements ------------------------------------------------------------ revno: 197.1.75 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Sun 2007-05-06 10:42:18 +1000 message: break out the code to update all nodes to the new vnnmap into a helper function ------------------------------------------------------------ revno: 197.1.74 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Sun 2007-05-06 10:38:44 +1000 message: create a helper function for recovery to push all local databases out onto the remote nodes ------------------------------------------------------------ revno: 197.1.73 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Sun 2007-05-06 10:30:18 +1000 message: add an extra blank line ------------------------------------------------------------ revno: 197.1.72 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Sun 2007-05-06 10:22:13 +1000 message: break the code that repoints dmaster for all local and remote records into a separate helper function ------------------------------------------------------------ revno: 197.1.71 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Sun 2007-05-06 10:16:48 +1000 message: create a helper function for recovery that pulls and merges all remote databases onto the local node ------------------------------------------------------------ revno: 197.1.70 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Sun 2007-05-06 10:12:42 +1000 message: create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node ------------------------------------------------------------ revno: 197.1.69 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Sun 2007-05-06 10:04:37 +1000 message: add a helper function to create all missing remote databases detected during recovery ------------------------------------------------------------ revno: 197.1.68 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Sun 2007-05-06 09:53:12 +1000 message: break out the setting/clearing of recovery mode into a dedicated helper function ------------------------------------------------------------ revno: 197.1.67 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Sun 2007-05-06 08:05:22 +1000 message: dont allocate arrays where we can just return a single integer ------------------------------------------------------------ revno: 197.1.66 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Sun 2007-05-06 07:52:20 +1000 message: dont use arrays where a uint32_t works just as well ------------------------------------------------------------ revno: 197.1.65 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Sun 2007-05-06 07:32:16 +1000 message: add a ifdeffed out block to the call. we really should kill the event in case the call completed before the timeout so that we can also make timed_out non-static ------------------------------------------------------------ revno: 197.1.64 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Sun 2007-05-06 07:07:47 +1000 message: hte timed_out variable needs to be static and can not be on the stack since if the command times out and we return from ctdb_control we may have events that can trigger later which will overwrite data that is no longer in our stackframe ------------------------------------------------------------ revno: 197.1.63 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Sun 2007-05-06 06:58:01 +1000 message: update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace ------------------------------------------------------------ revno: 197.1.62 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Sun 2007-05-06 06:06:39 +1000 message: in the recover test start the daemons with explicit socketnames and explicit ip address/port remove all --socket= from all ctdb_control calls since they are not needed anymore ------------------------------------------------------------ revno: 197.1.61 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Sun 2007-05-06 05:53:15 +1000 message: add support in catdb to dump the content of a specific nodes tdb instead of traversing the full cluster. this makes it easier to debug recovery update the test script for recovery to reflect the newish signatures to ctdb_control the catdb control does still segfault however when there are missing nodes in the cluster as there are toward the end of the recovery test ------------------------------------------------------------ revno: 197.1.60 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Sun 2007-05-06 04:38:41 +1000 message: merge from tridge ------------------------------------------------------------ revno: 197.1.59 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Sun 2007-05-06 04:31:22 +1000 message: add a control to get the pid of a daemon. this makes it possible to kill a specific daemon in the recover test script ------------------------------------------------------------ revno: 197.1.58 merged: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] parent: [EMAIL PROTECTED] committer: Ronnie Sahlberg <[EMAIL PROTECTED]> branch nick: ctdb timestamp: Sat 2007-05-05 16:51:34 +1000 message: merge from tridge
Diff too large for email (1498, the limit is 200).