Message queues stop working correctly after queue file is removed from /tmp.
Message queue API uses "ftok" which relies on the file being permanent. The
behaviour is undefined if the file is removed. Many systems clean out /tmp
periodically, so this can break if the message queue is long lived.
C
5f9
Author: Alex Jones
Date: Thu, 13 Feb 2020 08:39:46 -0500
base: fix creation of msg queues [#3107]
Message queues stop working correctly after queue file is removed from /tmp.
Message queue API uses "ftok" which relies on the file being permanent. The
behaviour is undefined if
revision d21dd0c020e33fd8932481976571d3ed22580ef5
Author: Alex Jones
Date: Fri, 7 Feb 2020 13:52:12 -0500
amfd: fix calculating standby rank for SIrankedSU with non-unique rank [#3149]
Standby rank which is passed to CSI set and protection group callbacks may not
be accurate.
If SIrankedSUs exis
Standby rank which is passed to CSI set and protection group callbacks may not
be accurate.
If SIrankedSUs exist with non-unique ranks, AVD_SI::get_sisu_rank() is not
traversing all the SUs at that rank to determine the standby rank.
AVD_SI::get_sisu_rank() needs to traverse all the SUs at the pa
i_value))[len] = '\0';
=> strncpy with "len + 1" then later overwrite with `\0'.
I suggest strncpy with "len" as original code to avoid redundant
changes.
Best Regards,
ThuanTr
From: Alex Jones [1]
Sent: Monday, February 3, 2020 10:
more fixes
---
src/ntf/apitest/test_ntf_imcn.cc | 53 +++-
src/plm/plmcd/plmc_read_config.c | 2 +-
2 files changed, 40 insertions(+), 15 deletions(-)
diff --git a/src/ntf/apitest/test_ntf_imcn.cc b/src/ntf/apitest/test_ntf_imcn.cc
index b1a1e87b4..51b9076c6 100644
--
more issues
---
src/imm/immloadd/imm_pbe_load.cc | 7 ++-
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/src/imm/immloadd/imm_pbe_load.cc b/src/imm/immloadd/imm_pbe_load.cc
index 72b926383..5f5aefcec 100644
--- a/src/imm/immloadd/imm_pbe_load.cc
+++ b/src/imm/immloadd/imm_pbe_lo
Mostly strncpy and strncat problems.
---
src/base/daemon.c | 1 +
src/ckpt/ckptd/cpd_imm.c | 4 ++--
src/ckpt/ckptnd/cpnd_res.c| 2 +-
src/clm/clmd/clms_imm.cc | 2 +-
src/dtm/dtmnd/dtm_intra_svc.cc
Rework fixes in NTF and SMF.
---
src/ntf/apitest/test_ntf_imcn.cc | 2 +-
src/smf/smfd/SmfUtils.cc | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/ntf/apitest/test_ntf_imcn.cc b/src/ntf/apitest/test_ntf_imcn.cc
index 51b9076c6..04f155074 100644
--- a/src/ntf/apit
ERIES HERE ***
revision 1c9c9c9aa23f95939597b0e29055c94c24e2815a
Author: Alex Jones
Date: Mon, 3 Feb 2020 10:32:17 -0500
build: fix compile errors with gcc 9.x [#3134]
Rework fixes in NTF and SMF.
revision 560b3243c3bcd821ca67839de8a4ee2825422966
Author: Alex Jones
Date: Mon, 3 Feb 2020 10:32:17 -0
More compiler fixes
---
src/imm/common/immpbe_dump.cc| 2 +-
src/plm/plmcd/plmc_read_config.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/imm/common/immpbe_dump.cc b/src/imm/common/immpbe_dump.cc
index 3bde78a3f..175bd0484 100644
--- a/src/imm/common/immpbe_dump.
EXPLAIN/COMMENT THE PATCH SERIES HERE ***
revision fd81f84a655def349896e175c4615023f1f99151
Author: Alex Jones
Date: Thu, 30 Jan 2020 10:58:28 -0500
amfnd: don't quiesce comp which is in TERMINATION_FAILED state [#3147]
When SU goes into TERMINATION_FAILED because one of it
When SU goes into TERMINATION_FAILED because one of its components went to
TERMINATION_FAILED, amfnd will still send QUIESCED to those components,
even though they are already terminating. This can cause the SG to go into
unstable state, and get stuck.
IsCompQualifiedAssignment does not check for
revision 84ddb28a1b5fd0b9b24795196c523b5b050effbe
Author: Alex Jones
Date: Mon, 17 Sep 2018 15:42:04 -0400
uml: add support for plm to run under uml [#2922]
Add support for plm to run under uml.
Added Files:
src/plm/config/openhpi.conf
Complete diffstat:
--
src/plm/config/op
Add support for plm to run under uml.
---
src/plm/config/openhpi.conf| 18
tools/cluster_sim_uml/archive/scripts/40opensaf.rc | 30 +++
tools/cluster_sim_uml/build_uml| 95 --
3 files changed, 138 insertions(+), 5 deletions(-
Ack. I will push it.
Alex
On 09/11/2018 08:55 AM, Meenakshi TK wrote:
__
NOTICE: This email was received from an EXTERNAL sender
__
Summary: p
not be anything other than"
+ "START/VALIDATE. change_step: %d",
trk_info->change_step);
One typo above is datebase which should be database.
Thanks,
Meenakshi
High Availability Solutions Pvt. Ltd.
[2]www.hasolutions.in
----- Original Me
servicesn
Core libraries n
Samples n
Tests n
Other n
Comments (indicate scope for each "y" above):
-
revision c0e8a1d9b6e1a8e53f8f0ffbff9b86c40ee0d6b6
Author: Alex J
Jan 22 11:09:03 localhost osafplmd[3988]: Invocation id mentioned in the resp,
is not found in the grp->inocation_list. inv_id: 9
If multiple entities are part of the same entity group, and START or VALIDATE
tracking is requested, if an admin operation is done on these entities, once
one response
saPlmReadinessTrackResponse sometimes returns SA_AIS_OK, when invalid
parameters are passed.
SaPlmReadinessTrackResponseT parameter is not checked for range. Also,
the msg is sent asynchronously from the agent to plmd, so that errors
from plmd cannot be passed back to the agent.
Check the SaPlmRe
servicesn
Core libraries n
Samples n
Tests n
Other n
Comments (indicate scope for each "y" above):
-
revision c87593a8180c59b4c3e7f0bd0b8789dac72b0415
Author: Alex J
ility Solutions Pvt. Ltd. ([2]www.hasolutions.in)
- OpenSAF Support and Services
- Original Message -
Subject: Re: [PATCH 1/1] amf: add support for container/contained [#70]
From: "Alex Jones" [3]
Date: 8/29/18 9:29 pm
To: [4]nagen...@hasolu
Ack. I will push it.
Alex
On 09/06/2018 04:40 AM, Meenakshi TK wrote:
__
NOTICE: This email was received from an EXTERNAL sender
__
---
src
for plm: correct first
arguement of API saPlmEntityGroupAdd() in apitest [#1983]
From: "Alex Jones" [3]
Date: 8/27/18 10:42 pm
To: "Meenakshi TK" [4],
[5]nagen...@hasolutions.in
Cc: [6]opensaf-devel@lists.sourceforge.net
Hi,
This test is curr
Hi Mohan,
I am not able to reproduce the problem as described in the ticket.
Can you post your test code?
Alex
On 09/03/2018 03:32 AM, [1]mo...@hasolutions.in wrote:
__
NOTICE: This email was received fro
__
Hi Alex
No, I just ran kill 10 times to escalate restart to failover.
Do you have a really small probation time in your demo config?
Gary
On 28/8/18 4:09 am, Alex Jones wrote:
G'day Gary,
I can't reproduce this. Do you have a script or
,safApp=Container
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
Also, have you tried killing the amf_container_demo binary?
Thanks
Gary
On 14/08/18 05:00, Alex Jones wrote:
Hi Gary,
I just resubmitted a new p
OpenSAF servicesn
Core libraries n
Samples n
Tests n
Other n
Comments (indicate scope for each "y" above):
-
revision e18dabd0a8385ff61ba1ab0540eba4ee58b5cc4e
Author:
plmd crashes when saPlmReadinessTrack is called with entities pointer set,
but smaller than what plmd would return.
In this case plmd is returning ERR_NO_SPACE, which is correct, but it is
setting numberOfEntities without setting the entities pointer. This causes
the edu routines to crash.
It is
saAmfSUReadinessState=IN-SERVICE(2)
amf-state si:
safSi=SC-2N,safApp=OpenSAF
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=FULLY_ASSIGNED(2)
safSi=Contained_2N_1,safApp=Contained_2N
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignme
Hi,
This test is currently not enabled in
test_saPlmEntityGroupCreate.c. Can you please enable it as part of this
ticket?
Alex
On 08/20/2018 07:37 AM, Meenakshi TK wrote:
__
NOTICE: This email was rece
Hi Mohan,
Ack from me.
Alex
On 08/21/2018 04:16 AM, mohan kanakam wrote:
__
NOTICE: This email was received from an EXTERNAL sender
__
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
Also, have you tried killing the amf_container_demo binary?
Thanks
Gary
On 14/08/18 05:00, Alex Jones wrote:
Hi Gary,
I just resubmitted a new patch which breaks out the different
components, and add
Hi Gary,
I just resubmitted a new patch which breaks out the different
components, and addresses the other comments here. But, #2 (rejecting
all but NWay-active for container) should already be in there. Is there
a specific test you ran that didn't work?
Alex
On 08/13/20
Add support for container/contained amf common.
---
src/amf/common/amf_amfparam.h | 22 ++
src/amf/common/amf_d2nmsg.h | 11 +++
src/amf/common/amf_defs.h | 2 ++
src/amf/common/amf_util.h | 3 ++-
src/amf/common/d2nedu.c | 22 +-
s
Add support for container/contained for amf agent.
---
src/amf/agent/amf_agent.cc | 73 +++---
src/amf/agent/ava_cb.h | 1 +
src/amf/agent/ava_hdl.cc | 31
src/amf/agent/ava_mds.cc | 34 -
src/amf/agent/ava_m
This ticket adds support for container/contained in amfd.
---
src/amf/amfd/comp.cc | 65 ++--
src/amf/amfd/comp.h | 4 +-
src/amf/amfd/comptype.cc | 6 +-
src/amf/amfd/csi.cc | 6 ++
src/amf/amfd/csi.h | 3 +
src/amf/amfd/ndproc.cc | 14 +
src/am
Add support for container/contained samples.
---
samples/amf/Makefile.am | 2 +-
samples/amf/container/AppConfig-contained-2N.xml | 327 +
samples/amf/container/AppConfig-container.xml| 331 ++
samples/amf/container/Makefile.am| 45 ++
revision 9c9f7e04c39fca9030025b0a8394eabf328a4c70
Author: Alex Jones
Date: Mon, 13 Aug 2018 14:48:14 -0400
amf: add support for container/contained [#70]
Add support for container/contained samples.
revision cf9d7565376059239c0902555c1c4811db6deff2
Author: Alex Jones
Date: Mon, 13 Aug 2018 14:48:14 -0400
This ticket adds support for container/contained.
---
src/amf/amfnd/amfnd.cc| 5 ++-
src/amf/amfnd/avnd_cb.h | 2 +
src/amf/amfnd/avnd_comp.h | 64 +
src/amf/amfnd/avnd_evt.h | 1 +
src/amf/amfnd/avnd_mds.h | 4 +-
src/amf/amfnd/avnd_proc.h | 2 +
src
ender
__
Hi Alex
I can reproduce the coredump by doing "immcfg -f AppConfig-2N.xml" (the
amf_demo sample). It looks better with the patch.
Thanks
Gary
From: Alex Jones [1]
Organization: Ribbon
Date: Saturday, 4 August 2018 at 12:59 am
To: Gary Lee
;d
On 3/8/18, 11:25 am, "Gary Lee" [1] wrote:
Hi Alex
I haven't had a chance to look at it, but I did run our regression
tests with the patch.
amfd is segfaulting regularly, with backtraces like the attachment.
Thanks
Gary
From: Alex Jones [2]
Organizatio
8 04:22 PM, Alex Jones wrote:
Summary: amf: add support for container/contained [#70]
Review request for Ticket(s): 70
Peer Reviewer(s): Nagu, Hans, Ravi, Gary
Pull request to:
Affected branch(es): develop
Development branch: ticket-70
Base revision: 7f6f6c0531a0f5e4f2b0dc1abf4bab6962a3d1a9
Personal
revision d33e50eeb51ccf8808c24a445637d6f1472c396e
Author: Alex Jones
Date: Tue, 31 Jul 2018 16:06:47 -0400
amf: add support for container/contained [#70]
This ticket adds support for container/contained for AMF.
Added Files:
samples/amf/container/amf_container_demo.c
samples/amf
Core libraries n
Samples n
Tests n
Other n
Comments (indicate scope for each "y" above):
-
revision 9298b3b02ea0d1df99c9549402e427b7cefa7e78
Author: Alex Jones
Date: Fri, 11 M
Update msgd and msgnd to use CLM B.04.01.
---
src/msg/Makefile.am | 2 --
src/msg/common/mqsv_def.h | 5 +
src/msg/msgd/mqd_api.c| 15 ---
src/msg/msgd/mqd_clm.c| 17 +++--
src/msg/msgd/mqd_clm.h| 10 --
src/msg/msgnd/mqnd_init.c | 18 +++
If multiple nodes go down simultaneously which are hosting msg queues (e.g.
multiple VMs on a host, and the host goes down), msgd can take a long time to
process the node downs which blocks the main thread, and therefore the
healthcheck doesn't get processed, so msgd dies, which restarts the contro
n
Core libraries n
Samples n
Tests n
Other n
Comments (indicate scope for each "y" above):
-
revision 9bb598f8390aaf41c1e0dcd458ee0d82fae58999
Author: Alex Jones
Date: Fri, 1
When getting IMM info for a lock resource, SaLckResource, the information is
often not correct.
Both lckd and lcknd are not updating IMM correctly when SaLckResource
information changes at runtime.
Write test cases which make sure these attributes are being updated correctly.
And fix the issues.
revision 8fe4377c25259e1430717d3b67e2c4cc2fd3c66f
Author: Alex Jones
Date: Mon, 7 May 2018 10:04:42 -0400
lck: fix errors when displaying SaLckResource class [#2070]
When getting IMM info for a lock resource, SaLckResource, the information is
often not correct.
Both lckd and lcknd are not updating IMM corr
vicesy
OpenSAF servicesn
Core libraries n
Samples n
Tests n
Other n
Comments (indicate scope for each "y" above):
-
revision 4efaccbcde991cd3ff848e43af6c6d007912af
Child EEs (VMs) can fail to boot up when unlocking the parent EE.
The current code resets the VM when unlocking the parent EE. This is done in
plms_move_chld_ent_to_insvc(). Later in the unlock function, the child EEs
are reset again. libvirt does not like these resets being done in less than 1
se
Sometimes CLM will reboot a node which was locked with PLM admin command.
admin_op and stat_change are not being cleared in COMPLETED step in PLM
readiness callback.
Clear admin_op and stat_change.
---
src/clm/clmd/clms.h | 2 +-
src/clm/clmd/clms_plm.cc | 7 +++
src/clm/clmd/clms_u
y
OpenSAF servicesn
Core libraries n
Samples n
Tests n
Other n
Comments (indicate scope for each "y" above):
-
revision f566e34de691ace5bc7d2832bc1f06b481075db3
Au
the core dump at e.g. which address you receive the signal. Perhaps you
have found a "window"
where immnd is not monitored?
/Regards HansN
On 04/25/2018 03:23 PM, Alex Jones wrote:
Hi Hans,
I understand. But, what if it doesn't fail in the nid phase?
fmd does not pass the EE to opensaf_reboot when attempting to reset the peer.
The legacy code passed 0 to fm_mds_async_send. The new code passes
NCSMDS_SCOPE_NONE, but doesn't update how bcast_scope is used.
Change fm_mds_async_send to check bcast_scope. If it is not NCSMDS_SCOPE_NONE,
then use i
servicesy
Core libraries n
Samples n
Tests n
Other n
Comments (indicate scope for each "y" above):
-
revision 9ab40a006c71a27c140cea5a32ab71b33facdb25
Author: Alex Jones
D
revision 494407c7d28526ac0d616f9be8c2484981bbbeda
Author: Alex Jones
Date: Fri, 27 Apr 2018 14:37:12 -0400
lck: resurrect apitest [#2437]
Resurrect apitest
revision 106200a751299a2adf20574809845098e055b874
Author: Alex Jones
Date: Fri, 27 Apr 2018 14:29:53 -0400
lck: resurrect apit
Resurrect apitests
---
src/lck/apitest/test_ErrUnavailable.cc | 2 +-
src/lck/apitest/test_saLckLimitGet.cc | 2 +-
src/lck/apitest/test_saLckResourceClass.cc | 10 +++---
3 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/src/lck/apitest/test_ErrUnavailable.cc
b/src/lc
Resurrect apitest
---
src/lck/apitest/test_ErrUnavailable.cc | 2 +-
src/lck/apitest/test_saLckLimitGet.cc | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/lck/apitest/test_ErrUnavailable.cc
b/src/lck/apitest/test_ErrUnavailable.cc
index db1e0b72f..715efe47c 100644
---
Resurrect apitest
---
src/lck/Makefile.am | 5 +
src/lck/apitest/test_saLckLimitGet.cc | 7 +--
2 files changed, 2 insertions(+), 10 deletions(-)
diff --git a/src/lck/Makefile.am b/src/lck/Makefile.am
index 5b3102722..db3e043e1 100644
--- a/src/lck/Makefile.am
+++ b/src/
revision e04d343ab46a7409772001c61624eb39c2eb50aa
Author: Alex Jones
Date: Wed, 25 Apr 2018 10:27:13 -0400
msgd: handle abrupt restart of remote node [#2840]
Sometimes when a remote node restarts abruptly, queues which were created on
that node, are unable to be opened again when that node comes up.
There is a race
Sometimes when a remote node restarts abruptly, queues which were created on
that node, are unable to be opened again when that node comes up.
There is a race condition when the remote node goes down between msgd getting
the CLM and MDS events indicating node down, and immd removing the implemente
only happen if REBOOT_ON_FAIL_TIMEOUT is set, (i.e.
not 0).
I checked the latest version, the reboot works fine if e.g. immnd fails
in the nid phase and REBOOT_ON_FAIL_TIMEOUT is set.
/Thanks HansN
From: Alex Jones [[1]mailto:ajo...@rbbn.com]
Sent: den 25 april 2018 15:05
To: Ha
___
Hi Alex,
please see comment below.
/Thanks HansN
On 04/23/2018 03:56 PM, Alex Jones wrote:
Hi Hans,
I just did some tests. Maybe there is a bug in nid, but when I
do not have "Restart=on-failure", the node does not reboot when I
run the command
,
please see below for some comments/questions.
/Regards HansN
On 04/18/2018 03:41 PM, Alex Jones wrote:
When using PLM an AMF node mapped to a CLM node mapped to a PLM EE, can get
stuck in locked state when rebooting, or going through a PLM EE lock/unlock.
When amfd receives a START
andle the reboot
request if Restart=on-failure is set?
/BR HansN
______
Från: Alex Jones [1]
Skickat: den 19 april 2018 17:27:27
Till: Hans Nordebäck; Anders Widell
Kopia: [2]opensaf-devel@lists.sourceforge.net; Alex
OpenSAF servicesn
Core libraries n
Samples n
Tests n
Other n
Comments (indicate scope for each "y" above):
-
revision c67596599b7728ea45e2d449d5ba3c3103bf8452
Author: Alex J
Under certain circumstances opensafd fails to start (immnd or dtmd crashes,
etc).
Apr 19 15:07:31 ams-idsp-46-novnfm osafdtmd[3315]:
src/dtm/dtmnd/dtm_intra_svc.cc:1778: dtm_process_internode_service_up_msg:
Assertion '0' failed.
We can tell systemd to restart opensafd if it fails to start.
---
the patch from ticket 2834.
revision 9e09af922cf88a56ee4984abe46b01f363117e30
Author: Alex Jones
Date: Wed, 18 Apr 2018 09:08:41 -0400
amfd: if rootCauseEntity is PLM entity don't engage lock/lock-in [#2835]
When using PLM an AMF node mapped to a CLM node mapped to a PLM EE, can get
stuck
When using PLM an AMF node mapped to a CLM node mapped to a PLM EE, can get
stuck in locked state when rebooting, or going through a PLM EE lock/unlock.
When amfd receives a START step from CLM tracking it attempts to gracefully
shutdown the AMF node using AMF admin operations lock/lock-in. When P
Abrupt restart or unlock-in of child EE does not always work.
virDomainReset() does not always work.
Use virDomainDestroy() and virDomainCreate() instead.
---
src/plm/plmd/plms_virt.cc | 16 ++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/src/plm/plmd/plms_virt.cc
OpenSAF servicesn
Core libraries n
Samples n
Tests n
Other n
Comments (indicate scope for each "y" above):
-
revision 56a0e35daf04083c5fb76270dbf0163b03500d58
Author:
a37
Author: Alex Jones
Date: Thu, 12 Apr 2018 10:53:19 -0400
clmd: pass rootCauseEntity from PLM tracking to CLM tracking clients [#2834]
CLM tracking clients have no context for the tracking callback.
PLM rootCauseEntity is not passed by CLM to its own tracking clients.
When CLM tracking
CLM tracking clients have no context for the tracking callback.
PLM rootCauseEntity is not passed by CLM to its own tracking clients.
When CLM tracking is invoked because of PLM tracking, pass on the
rootCauseEntity.
---
src/clm/clmd/clms_evt.cc | 4 +--
src/clm/clmd/clms_imm.cc | 80
Ack.
Alex
On 04/03/2018 06:46 AM, srinivas wrote:
__
NOTICE: This email was received from an EXTERNAL sender
__
---
src/msg/apitest/test_Me
servicesn
Core libraries n
Samples n
Tests n
Other n
Comments (indicate scope for each "y" above):
-
revision ae59ca0e4d33b97d3fbc28d531452e391afe488a
Author: Alex J
If EE unlock fails, it is never retried when management is regained. The EE
just sits in LOCKED admin state.
If EE unlock fails, the code continues as if it did succeed, setting readiness
state to in-service, etc.
If EE unlock fails, just return ERR_DEPLOYMENT immediately, and don't set
anything
Hi Srinivas,
Two comments:
1. Put the new include file before the above "msg/..." files, so it is
in alphabetical order
2. change the test, so there is only one aisrc_validate call in it.
Otherwise, 2 PASSED show up for the test.
Alex
On 03/26/2018 07:23 AM,
not handle
admin-operation-pending for child EEs while the parent EE was not available.
revision 28094fa2491d458478491d6343f0be4fb5ecdbd7
Author: Alex Jones
Date: Thu, 22 Mar 2018 20:46:14 -0400
plmd: connect to hypervisor after middleware switchover [#2817]
Any PLM admin operation whic
Any PLM admin operation which requires hypervisor assistance (e.g. unlock-in,
abrupt restart) will fail after middleware switchover.
When plmcds are reconnecting to the new active plmd, the plmd does not attempt
to connect to the hypervisor if the EE is a virtual machine monitor.
Connect to the h
After a middleware switchover, EE admin commands that need hypervisor support
do not work (e.g. unlock-in, abrupt restart).
After the switchover, the plmcds on the different nodes reconnect to the new
plmd. But, the new plmd does not make any contact with the hypervisors. So, the
commands fail.
W
revision 6042af1f311dc6b6ec270bd0aaa8e570e6477842
Author: Alex Jones
Date: Wed, 21 Mar 2018 11:49:55 -0400
plmd: connect to hypervisor after middleware switchover [#2817]
After a middleware switchover, EE admin commands that need hypervisor support
do not work (e.g. unlock-in, abrupt restart).
After the switc
servicesn
Core libraries n
Samples n
Tests n
Other n
Comments (indicate scope for each "y" above):
-
revision 36e5a1d4fb123862cc442301140f70e8ce10a7c4
Author: Alex Jones
Date:
During q transfer when new node is opening the q, msgnd fails to create the
runtime IMM object for the queue, and the open fails.
When the transfer is done, the old side and owner of the runtime object doesn't
delete the IMM object until after the q transfer response is sent. This is a
race condit
Dynamic tracing does not work with plmd.
plmd overrides the USR2 signal with its own dump routine.
Remove the signal hander code for USR2 in plmd.
---
src/plm/plmd/plms_main.c | 20
1 file changed, 20 deletions(-)
diff --git a/src/plm/plmd/plms_main.c b/src/plm/plmd/plms_ma
libraries n
Samples n
Tests n
Other n
Comments (indicate scope for each "y" above):
-
revision c75a7990a32d4d0d05bad0ba69e920dd42d780e8
Author: Alex Jones
Date: Wed, 7 Mar 2018 15:3
vicesy
OpenSAF servicesn
Core libraries n
Samples n
Tests n
Other n
Comments (indicate scope for each "y" above):
-
revision 916b838764c03891c5e35b18626d89aadbb5ca
Opening of an existing msg q using saMsgQueueOpen (for q failover) may take a
long time.
When cold sync is done, sometimes two MDS cold sync requests are sent by the
standby, so the standby can receive 2 cold syncs. The standby code to process
the cold sync response blindly adds the tracking entri
signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot__
signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot__
signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot__
revision 14dfc8f3e86559585b072a9c18025cb562caaeff
Author: Alex Jones
Date: Tue, 2 Jan 2018 10:45:31 -0500
plm: handle race condition for EE instantiation [#2514]
Child EE which is a controller can get shutdown because its parent EE (host)
has not connected to PLM, yet.
If the controller is a VM, and the
Child EE which is a controller can get shutdown because its parent EE (host)
has not connected to PLM, yet.
If the controller is a VM, and the host is a payload, there is a race
condition when instantiating the EEs. If the host doesn't connect to PLM
first, then when the controller EE (child of ho
vicesy
OpenSAF servicesn
Core libraries n
Samples n
Tests n
Other n
Comments (indicate scope for each "y" above):
-
revision 1a3ad81467d91b4f98b76657821e256645a3e5
If an EE goes down during a controller switchover the TERMINATED message
sent by plmc to plmd may not be received because of the switch over.
In this case the EE will be stuck in terminating presence state.
If any parent of the EE is in OOS, then we can definitely set the presence
state to UNINST
signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot__
signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot__
In virtual environments nodes can reboot very quickly (less than 1 minute). If
the reboot is abrupt, plmd may not be aware that the EE went down until after
it has already come back up because plmd relies on the TCP connection to plmcd
on the node. In this case, plmd will set the readiness state to
revision caa9f9f93e507748ec6fb43c97d83967f4c6045b
Author: Alex Jones
Date: Thu, 7 Dec 2017 11:31:46 -0500
plm: handle plmc clients which abruptly terminated [#2529]
In virtual environments nodes can reboot very quickly (less than 1 minute). If
the reboot is abrupt, plmd may not be aware that the EE went
1 - 100 of 385 matches
Mail list logo