On 03/01/2017 05:28 PM, Andrew Beekhof wrote: > On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenberg > <lars.ellenb...@linbit.com> wrote: >> When I recently tried to make use of the DEGRADED monitoring results, >> I found out that it does still not work. >> >> Because LRMD choses to filter them in ocf2uniform_rc(), >> and maps them to PCMK_OCF_UNKNOWN_ERROR. >> >> See patch suggestion below. >> >> It also filters away the other "special" rc values. >> Do we really not want to see them in crmd/pengine? > > I would think we do. > >> Why does LRMD think it needs to outsmart the pengine? > > Because the person that implemented the feature incorrectly assumed > the rc would be passed back unmolested. > >> >> Note: I did build it, but did not use this yet, >> so I have no idea if the rest of the implementation of the DEGRADED >> stuff works as intended or if there are other things missing as well. > > failcount might be the other place that needs some massaging. > specifically, not incrementing it when a degraded rc comes through
I think that's already taken care of. >> Thougts?\ > > looks good to me > >> >> diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c >> index 724edb7..39a7dd1 100644 >> --- a/lrmd/lrmd.c >> +++ b/lrmd/lrmd.c >> @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const char >> *stdout_data) >> static int >> ocf2uniform_rc(int rc) >> { >> - if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) { >> - return PCMK_OCF_UNKNOWN_ERROR; Let's simply use > PCMK_OCF_OTHER_ERROR here, since that's guaranteed to be the high end. Lars, do you want to test that? >> + switch (rc) { >> + default: >> + return PCMK_OCF_UNKNOWN_ERROR; >> + >> + case PCMK_OCF_OK: >> + case PCMK_OCF_UNKNOWN_ERROR: >> + case PCMK_OCF_INVALID_PARAM: >> + case PCMK_OCF_UNIMPLEMENT_FEATURE: >> + case PCMK_OCF_INSUFFICIENT_PRIV: >> + case PCMK_OCF_NOT_INSTALLED: >> + case PCMK_OCF_NOT_CONFIGURED: >> + case PCMK_OCF_NOT_RUNNING: >> + case PCMK_OCF_RUNNING_MASTER: >> + case PCMK_OCF_FAILED_MASTER: >> + >> + case PCMK_OCF_DEGRADED: >> + case PCMK_OCF_DEGRADED_MASTER: >> + return rc; >> + >> +#if 0 >> + /* What about these?? */ > > yes, these should get passed back as-is too > >> + /* 150-199 reserved for application use */ >> + PCMK_OCF_CONNECTION_DIED = 189, /* Operation failure implied by >> disconnection of the LRM API to a local or remote node */ >> + >> + PCMK_OCF_EXEC_ERROR = 192, /* Generic problem invoking the agent */ >> + PCMK_OCF_UNKNOWN = 193, /* State of the service is unknown - used >> for recording in-flight operations */ >> + PCMK_OCF_SIGNAL = 194, >> + PCMK_OCF_NOT_SUPPORTED = 195, >> + PCMK_OCF_PENDING = 196, >> + PCMK_OCF_CANCELLED = 197, >> + PCMK_OCF_TIMEOUT = 198, >> + PCMK_OCF_OTHER_ERROR = 199, /* Keep the same codes as PCMK_LSB */ >> +#endif >> } >> - >> - return rc; >> } >> >> static int _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org