[Openais] [PATCH] Display ring-ID consistently in debug

2011-08-16 Thread Tim Beale
sequence id %llx.%s for this ring.\n, instance-my_ring_id.seq, totemip_print (instance-my_ring_id.rep)); for (i = 0; i instance-totem_config-interfaces[iface_no].member_count; i++) { From: Tim Beale tim.be...@alliedtelesis.co.nz Display

[Openais] CPG client can lockup if the local node is in the downlist

2011-08-16 Thread Tim Beale
; memset (cpd-group_name, 0, sizeof(cpd-group_name)); From: Tim Beale tim.be...@alliedtelesis.co.nz A CPG client can sometimes lockup if the local node is in the downlist In a 10-node cluster where all nodes are booting up and starting corosync at the same time, sometimes during

Re: [Openais] Problems forming cluster on corosync startup

2011-08-14 Thread Tim Beale
. It definitely rules out #2. I can repeat the test with healthchecking disabled to narrow down if #1 or #3 will occur. Regards, Tim On Thu, Aug 11, 2011 at 4:21 AM, Steven Dake sd...@redhat.com wrote: On 08/09/2011 09:56 PM, Tim Beale wrote: Hi Steve, Thanks for your patch. 1. I don't see

Re: [Openais] Problems forming cluster on corosync startup

2011-08-08 Thread Tim Beale
On Mon, Aug 8, 2011 at 6:08 AM, Steven Dake sd...@redhat.com wrote: On 08/03/2011 10:32 PM, Tim Beale wrote: Hi, It looks to me that the way the transition from Recovery to Operational works, we can't guarantee that all nodes in the ring have entered Operational before a node processes

Re: [Openais] Problems forming cluster on corosync startup

2011-08-03 Thread Tim Beale
processing the Memb-Join message if the node's only just entered operational. iii) Should we not be using CLM like this? I.e. should we just learn to live with CLM/CPG sometimes reporting nodes as leaving when they're perfectly healthy. Thanks for your help. Tim On Wed, Aug 3, 2011 at 3:28 PM, Tim

[Openais] Some messages still leaked in recovery code

2011-07-18 Thread Tim Beale
is a patch that fixes the problem for me. I tested it on v1.3.1, but the patch should apply to trunk. Let me know if I've misunderstood anything, or if any of the patch needs fixing up. Cheers, Tim From: Tim Beale tim.be...@alliedtelesis.co.nz Fix memory leak when entering recovery repeatedly

[Openais] Add a few more stats for debugging

2011-07-18 Thread Tim Beale
Hi, Attached is a patch that adds a few more more stats (the code was actually written by Angus). We find these stats useful - hopefully others will too. Cheers, Tim From: Tim Beale tim.be...@alliedtelesis.co.nz Add some more stats for debugging + overload - number of times client is told

Re: [Openais] Question about recovery code

2011-07-15 Thread Tim Beale
Hi Steve, Thanks for your help. I've tried out your patch and confirmed it fixes the problem. Cheers, Tim On Fri, Jul 8, 2011 at 10:36 AM, Steven Dake sd...@redhat.com wrote: On 07/07/2011 03:07 PM, Tim Beale wrote: Hi Steve, Thanks for your help. When we upgraded to v1.3.1 we picked up

Re: [Openais] Question about recovery code

2011-07-07 Thread Tim Beale
But I suspect it's reliant on timing/messaging specific to my system. Let me know if there's any debug or anything you want me to try out. Thanks, Tim On Thu, Jul 7, 2011 at 3:47 PM, Steven Dake sd...@redhat.com wrote: On 07/06/2011 05:24 PM, Tim Beale wrote: Hi, We've hit a problem

Re: [Openais] Fix compile/runtime issues for _POSIX_THREAD_PROCESS_SHARED 1

2011-07-06 Thread Tim Beale
} va_end (ap); #ifdef __NR_semctl return __semctl(semid, semnum, cmd | __IPC_64, arg.__pad); #else return __syscall_ipc(IPCOP_semctl, semid, semnum, cmd|__IPC_64, arg, NULL); #endif } On Thu, Jul 7, 2011 at 1:46 AM, Steven Dake sd...@redhat.com wrote: On 07/05/2011 07:22 PM, Tim

[Openais] Question about recovery code

2011-07-06 Thread Tim Beale
Hi, We've hit a problem in the recovery code and I'm struggling to understand why we do the following: /* * The recovery sort queue now becomes the regular * sort queue. It is necessary to copy the state * into the regular sort queue. */

[Openais] startup error - getpwnam_r() returns ERANGE for some systems

2011-07-05 Thread Tim Beale
about this code, but judging by the man page for getpwnam_r, the correct way to determine the buffersize on any given system is to use sysconf(). Attached is a patch that does this. Cheers, Tim From: Tim Beale tim.be...@alliedtelesis.co.nz getpwnam_r()/getgrnam_r() returns ERANGE for some systems

[Openais] Fix compile/runtime issues for _POSIX_THREAD_PROCESS_SHARED 1

2011-07-05 Thread Tim Beale
portability it is best to always call semctl() with four arguments'. The attached patch does this. Cheers, Tim From: Tim Beale tim.be...@alliedtelesis.co.nz Fix compile/runtime issues for _POSIX_THREAD_PROCESS_SHARED 1 For the case where _POSIX_THREAD_PROCESS_SHARED 1, the code doesn't compile

[Openais] CLM saClmClusterTrackCallback sometimes called with node-ID of zero

2010-11-02 Thread Tim Beale
Hi, I noticed a quirk with CLM where it sometimes passes a client application a CLM node-ID of zero in saClmClusterTrackCallback. The problem seems to be a timing issue at startup. The situation is a client application is registering with the CLM service at the same time corosync is starting up

[Openais] corosync enters recovery repeatedly on lossy network

2010-06-17 Thread Tim Beale
Hi, I'm running corosync on a setup where corosync packets are getting delayed and lost. I'm seeing corosync enter recovery mode repeatedly, which is then causing other problems for us. (We're running trunk as at revision 2569 (8 Dec 09), so some of these flow-on problems may already be fixed.)

Re: [Openais] proposal for better end to end flow control

2010-04-08 Thread Tim Beale
Hi Steve, Could you send me your backlog backoff calculation code (or preferably tarball of source tree)?  I'd like to see what you have. Attached is a patch that should apply to trunk, which contains the changes we've made. It's based off Angus's patch you referred to, which holds onto the

[Openais] Intermittent corosync message corruption

2009-12-15 Thread Tim Beale
Lab, NZ. * * All rights reserved. * * Author: Angus Salkeld (angus.salk...@alliedtelesis.co.nz) * Author: Tim Beale (tim.be...@alliedtelesis.co.nz) * * This software licensed under BSD license, the text of which follows: * * Redistribution and use in source and binary forms, with or without

[Openais] CPG flow control question

2007-11-22 Thread Tim Beale
Hi, I'm currently using EVS messaging to propagate events across a cluster, and I was hoping that switching to CPG would help minimise some IPC disconnect issues I've been having. However, I'm not sure I fully understand how CPG flow control works. My test is: I've got 2 nodes in a cluster,