[devel] [PATCH 0 of 1] Review Request for clm: avoid stale node down processing and unexpected track callback [#1120]

mathi . naickan Mon, 22 Sep 2014 07:44:10 -0700

Summary: clm: avoid stale node down processing and unexpected track callback 
[#1120]
Review request for Trac Ticket(s): #1120
Peer Reviewer(s): HansN, RameshB
Pull request to: <<LIST THE PERSON WITH PUSH ACCESS HERE>>
Affected branch(es): opensaf-4.3.x and above
Development branch: <<IF ANY GIVE THE REPO URL>>


--------------------------------
Impacted area       Impact y/n
--------------------------------
 Docs                    n
 Build system            n
 RPM/packaging           n
 Configuration files     n
 Startup scripts         n
 SAF services            y
 OpenSAF services        n
 Core libraries          n
 Samples                 n
 Tests                   n
 Other                   n


Comments (indicate scope for each "y" above):
---------------------------------------------

changeset fdc4fdc114d38917ffa25f47c102011e10a8cdd4
Author: Mathivanan N.P.<mathi.naic...@oracle.com>
Date:   Mon, 22 Sep 2014 20:07:23 -0400

        clm: avoid stale node down processing and unexpected track callback 
[#1120]
        There is a possiblity that the checkpointing message for a NODE_DOWN 
reaches
        the STANDBY first, i.e. before the MDS delivers the NODE_DOWN event the 
the
        standby. This can result in stale node_down record getting stored in the
        node_down list which is a designated list for processing of node downs 
that
        occur during role change from standby to active. The patch introduces a
        variable that checks whether the checkpoint event for node_down has 
arrived
        first, followed by a check during role change to ignore such stale 
events.


Thanks to HansN for suggesting the possibility of this theory. This is an 
extremely
rare scenario.

Complete diffstat:
------------------
 osaf/services/saf/clmsv/clms/clms_cb.h  |   6 ++++++
 osaf/services/saf/clmsv/clms/clms_evt.c |  48 
+++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 47 insertions(+), 7 deletions(-)


Testing Commands:
-----------------
Trigger random reboots of payloads. Follow these events with switchovers & 
failovers.
There should not be any unrelated track callbacks being generated.

Testing, Expected Results:
--------------------------
Same as above.
At this point of time, i havenot been able to simulate the scenario. 
My bigger interest is to protecting regression also. The fix is rather simple 
as well.


Conditions of Submission:
-------------------------
Ack from HansN or Ramesh.

Arch      Built     Started    Linux distro
-------------------------------------------
mips        n          n
mips64      n          n
x86         n          n
x86_64      y          y
powerpc     n          n
powerpc64   n          n


Reviewer Checklist:
-------------------
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
    that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
    (i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
    Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
    like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
    cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
    too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
    Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
    commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
    of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
    comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)

___ Your computer have a badly configured date and time; confusing the
    the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
    for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
    do not contain the patch that updates the Doxygen manual.


------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0 of 1] Review Request for clm: avoid stale node down processing and unexpected track callback [#1120]

Reply via email to