Summary: clm: avoid stale node down processing and unexpected track callback [#1120] Review request for Trac Ticket(s): #1120 Peer Reviewer(s): HansN, RameshB Pull request to: <<LIST THE PERSON WITH PUSH ACCESS HERE>> Affected branch(es): opensaf-4.3.x and above Development branch: <<IF ANY GIVE THE REPO URL>>
-------------------------------- Impacted area Impact y/n -------------------------------- Docs n Build system n RPM/packaging n Configuration files n Startup scripts n SAF services y OpenSAF services n Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): --------------------------------------------- changeset fdc4fdc114d38917ffa25f47c102011e10a8cdd4 Author: Mathivanan N.P.<mathi.naic...@oracle.com> Date: Mon, 22 Sep 2014 20:07:23 -0400 clm: avoid stale node down processing and unexpected track callback [#1120] There is a possiblity that the checkpointing message for a NODE_DOWN reaches the STANDBY first, i.e. before the MDS delivers the NODE_DOWN event the the standby. This can result in stale node_down record getting stored in the node_down list which is a designated list for processing of node downs that occur during role change from standby to active. The patch introduces a variable that checks whether the checkpoint event for node_down has arrived first, followed by a check during role change to ignore such stale events. Thanks to HansN for suggesting the possibility of this theory. This is an extremely rare scenario. Complete diffstat: ------------------ osaf/services/saf/clmsv/clms/clms_cb.h | 6 ++++++ osaf/services/saf/clmsv/clms/clms_evt.c | 48 +++++++++++++++++++++++++++++++++++++++++------- 2 files changed, 47 insertions(+), 7 deletions(-) Testing Commands: ----------------- Trigger random reboots of payloads. Follow these events with switchovers & failovers. There should not be any unrelated track callbacks being generated. Testing, Expected Results: -------------------------- Same as above. At this point of time, i havenot been able to simulate the scenario. My bigger interest is to protecting regression also. The fix is rather simple as well. Conditions of Submission: ------------------------- Ack from HansN or Ramesh. Arch Built Started Linux distro ------------------------------------------- mips n n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: ------------------- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. ------------------------------------------------------------------------------ Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel