[devel] current opensaf-4.7.x (4.7.0-1) and default (5.0.M0-1 ) in-service upgrade is not working

2016-03-02 Thread A V Mahesh
Hi All, With current opensaf-4.7.x (4.7.0-1) and default (5.0.M0-1 ) in-service upgrade is not working the default (5.0.M0-1 ) Node is not able to joining cluster as Standby with following error : =

Re: [devel] [PATCH 01 of 15] amfd: Add support for cloud resilience at common libs [#1620]

2016-03-02 Thread minh chau
Hi Nagu, Praveen, I have been trying your patch, with the test case below: Setup 2N model, PL4 host SU4 (act), PL5 host SU5(stb) 1. issue admin command shutdown SG 2. Hanging quiescing csi_set callback 3. Stop both SCs 4. Stop PL4 5. Restart both SCs I have seen this error after SCs come back als

Re: [devel] [PATCH 1 of 1] amfd: delete su and its child objects at stdby amfd [#1683]

2016-03-02 Thread Hans Nordebäck
Hi Nagu, good , my questions below was related to the three iterators and deleting while iterating. /Thanks HansN -Original Message- From: Nagendra Kumar [mailto:nagendr...@oracle.com] Sent: den 3 mars 2016 07:01 To: Hans Nordebäck; Praveen Malviya; Minh Chau H; Gary Lee Cc: opensaf-de

Re: [devel] [PATCH 1 of 1] amfd: delete su and its child objects at stdby amfd [#1683]

2016-03-02 Thread Nagendra Kumar
Hi Hans, I will send the modified patch. Thanks -Nagu > -Original Message- > From: Nagendra Kumar > Sent: 03 March 2016 10:50 > To: Hans Nordebäck; Praveen Malviya; Minh Chau H; Gary Lee > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [devel] [PATCH 1 of 1] amfd: delete su and it

Re: [devel] [PATCH 1 of 1] amfd: delete su and its child objects at stdby amfd [#1683]

2016-03-02 Thread Nagendra Kumar
Hi Hans N, Thanks for your review. If I incorporate the comment, Is that Ack ? Thanks -Nagu > -Original Message- > From: Hans Nordebäck [mailto:hans.nordeb...@ericsson.com] > Sent: 29 February 2016 19:30 > To: Nagendra Kumar; Praveen Malviya; Minh Chau H; Gary Lee > Cc: op

Re: [devel] [PATCH 01 of 15] amfd: Add support for cloud resilience at common libs [#1620]

2016-03-02 Thread minh chau
Hi Nagu, Praveen From patch 09 to patch 14, they are fixes for bugs that you also need on top of patches #4. The problems you reported should not happen if you have them. They are regardless whether we *reboot node if transient states* or *adjust transient states* (delayed failover). Patch 0

Re: [devel] [PATCH 0 of 1] Review Request for log: Support AMF configurations containing more than two OpenSAF 2N SUs [#79]

2016-03-02 Thread Lennart Lund
Hi Anders, Ack with comments Have tested with legacy test PASS. Comments: Instead of logging SaAisErrorT as a number (%u) it could be logged using saf_error() Note: Will not apply on top of the resilience patch. After discussion with Anders the resilience patch will be pushed before this patc

Re: [devel] [PATCH 01 of 15] amfd: Add support for cloud resilience at common libs [#1620]

2016-03-02 Thread Nagendra Kumar
#1 I have applied patches #1 to #4 only. With this patches(not having patch #6), I thought to have passed most of the following tests, but they got failed(Listed below). I could not test other scenarios (including alarms and notifications), because I haven't applied patch #6. I think there s

[devel] [PATCH 1 of 1] fm: Increase the default activation supervision time-out to five minutes [#79]

2016-03-02 Thread Anders Widell
osaf/services/infrastructure/fm/config/fmd.conf | 2 +- osaf/services/infrastructure/fm/fms/fm_main.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) The default activation supervision time-out was set too low, which could cause it to expire e.g. on systems with a large number of ob

[devel] [PATCH 0 of 1] Review Request for fm: Increase the default activation supervision time-out to five minutes [#79]

2016-03-02 Thread Anders Widell
Summary: fm: Increase the default activation supervision time-out to five minutes [#79] Review request for Trac Ticket(s): 79 Peer Reviewer(s): Mathi Pull request to: Affected branch(es): default(5.0) Development branch: default Impacted area Impact y/n ---

Re: [devel] [PATCH 1 of 4] log: add support for cloud resilience feature (service part) [#1179]

2016-03-02 Thread Lennart Lund
Hi Vu Ack with comments: I have not done a complete deep review of all the resilience code I have mostly looked at stream close during headless which is an addition to the original resilience patch. Also I have only been able to test the "legacy" functionality of the log service so eventual pr

Re: [devel] [PATCH 1 of 1] smf: Support AMF configurations containing more than two OpenSAF 2N SUs [#79]

2016-03-02 Thread Ingvar Bergström
ACK /Ingvar -Original Message- From: Anders Widell Sent: den 29 februari 2016 15:10 To: Ingvar Bergström; Rafael Odzakow Cc: opensaf-devel@lists.sourceforge.net Subject: [PATCH 1 of 1] smf: Support AMF configurations containing more than two OpenSAF 2N SUs [#79] osaf/services/saf/smfs

Re: [devel] [PATCH 04 of 15] amfnd: Add support for cloud resilience at node director [#1620]

2016-03-02 Thread minh chau
For instance, application can configure one component restart can lead to node failover, and this escalation path should work during headless the same way as in non-headless. But if the escalation path that needs comp/su failover, amfnd will *disable* the faulty comp/su and recovery/repair shall

Re: [devel] [PATCH 04 of 15] amfnd: Add support for cloud resilience at node director [#1620]

2016-03-02 Thread Anders Widell
Isolation should happen immediately, but it is the recovery and repair actions that can sometimes be postponed until the system controllers are back. regards, Anders Widell On 03/02/2016 12:18 PM, Mathivanan Naickan Palanivelu wrote: > Thanks for the explanation. My query was independent of the

Re: [devel] [PATCH 04 of 15] amfnd: Add support for cloud resilience at node director [#1620]

2016-03-02 Thread Mathivanan Naickan Palanivelu
Thanks for the explanation. My query was independent of the mail thread and Was generic to understand what 'delayed failover' terminology meant during the fault scenarios! I probably wanted to state that a solution that does not isolates the faulty resource once a fault is detected, would be aga

Re: [devel] [PATCH 04 of 15] amfnd: Add support for cloud resilience at node director [#1620]

2016-03-02 Thread Gary Lee
Hi Mathi I think Minh has previously said "delayed failover" isn't the best description of what patch 6 is doing. Minh has previously described it better as "adjust HA assignment"; moving transient states to states that realign() can work with. The transient states aren't necessarily caused

Re: [devel] [PATCH 04 of 15] amfnd: Add support for cloud resilience at node director [#1620]

2016-03-02 Thread Mathivanan Naickan Palanivelu
Hi All, What is 'delayed failover'? That sounds against the principles of 'software fault isolation'!? Thanks, Mathi. - minh.c...@dektech.com.au wrote: > Hi Praveen, > > Please see comments in line [Minh] > > Thanks, > Minh > > On 02/03/16 18:12, praveen malviya wrote: > > > > > > On 02