What is TCP broadcast? I have never heard of that...
My guess is that DTM_MCAST_ADDR allows you to specify the UDP multicast address
to be used for discovery. In Adrians case there is no broadcast address on the
eth0 interface in each container so he has to specify it instead if using the
one f
this backtrace indicates it is the same as
https://sourceforge.net/p/opensaf/tickets/1157/ duplicate of
https://sourceforge.net/p/opensaf/tickets/607/
---
** [tickets:#1072] Sync stop after few payload nodes joining the cluster (TCP)**
**Status:** unassigned
**Milestone:** 4.4.2
**Created:**
2) No; I have DTM_MCAST_ADDR=224.0.0.6. Leaving it empty does not work for me.
Is there any difference; why do I need to change to broadcast mode?
3) With 58860 I only can bring 5-6 payloads
Then TRY_AGAIN happen every time.
4) Yes reduce to lower value allows me to pass 5-6 nodes up to ~50
wit
Based on your data I understand following is your current status :
1) You are running opensaf in docker containers , and the Containers have
addresses 172.17.0.1 - 172.17.0.150
2) You configured TCP Broadcast ( that means DTM_MCAST_ADDR= is empty ).
only you updated the `DTM_NODE_IP`
Hi
I am starting to think it a bug in the batch sync logic or in combination with
MDS fragmentation.
Changing the default value 55388 to something lower; like 4096 don't trigger
the bug.
immcfg -a opensafImmSyncBatchSize=4096
opensafImm=opensafImm,safApp=safImmService
With this configura
I am running opensaf in docker containers:
* one cluster.
* have dont have any iptables rules.
* can reach internet from my containers
* can multicast network to other nodes in my network.
All containers are connected to docker0 bridge: inet addr:172.17.42.1
bridge name brid
It seems very fundamental TCP cluster bring-up with Broadcast is not working
for you So let us start from basic configuration.
1) please make sure all of you node are in same sub-net
say like :
SC-1 : 192.168.56.101slot -1
SC-2 : 192.168.56.102 slot -2
PL-3 : 192.168.56.103 slot -3
P
It does not work for me with empty DTM_MCAST_ADDR
The payload node just loops with;
Oct 3 19:08:42.162880 osafimmnd [3275:immnd_proc.c:0393] TR First
immnd_introduceMe, sending pbeEnabled:3 WITH params
Oct 3 19:08:42.163181 osafimmnd [3275:immnd_proc.c:0413] TR Possibly
extended intro
>Have you tried to start 7 nodes in container setup joining them one by one?
I am assuming that at a given point of time one node should be rebooted in
cluster.
if yes , it did test rebooting some payload and it works for me with TIPC
Broadcast , if no please provide sequence of reboots that yo
On 10/3/2014 12:11 PM, Adrian Szwej wrote:
> Yes; I meant #1036. I got instruction to test this patch to see if it help.
This Bug fix is for exclusively for TIPC , so TCP not effective in any
manner .
> DTMD config;
> DTM_NODE_IP=172.17.0.109
> DTM_MCAST_ADDR=224.0.0.6
It is news to me that y
Hi Mahesh
Yes; I meant #1036. I got instruction to test this patch to see if it help.
BR
**DTMD config**;
DTM_NODE_IP=172.17.0.109
DTM_MCAST_ADDR=224.0.0.6
**imm.xml**
Default generated 7-70 nodes. Does not matter. It is reproduciple with
around 6-8 nodes. immnd tracing seem to trigg
Mahesh; I have managed to bring 30 containers on one VM for quite some time
with
export IMMSV_NUM_NODES=30
export IMMSV_MAX_WAIT=50.
So initial loading seem to work differently than syncing node on join.
The biggest concern I have is the fault analysis here.
I have troubleshooted logs, mds log,
>On 10/2/2014 12:09 AM, Adrian Szwej wrote:
> I have now applied patch for #1032 ontop of 4.6 changeset 5969:ead18326c13b.
You mean [#1036] ?
> [devel] [PATCH 1 of 1] mds: use correct buff-length to distinguish
> mcast or multi-unicast [#1036]
> This patch does not resolve the problem.
This pa
Hi Adrian,
I have re-open the ticket and change component to MDS.
MDS responsible may be able to diagnose the cause just based on the
coredump.
I have not checked the MDS backlog if there is any older ticket
documenting similar symptoms.
https://sourceforge.net/p/opensaf/tickets/search/?q
#0 0x7fe7eba49bb9 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x7fe7eba4cfc8 in __GI_abort () at abort.c:89
#2 0x7fe7eba42a76 in __assert_fail_base (fmt=0x7fe7ebb94370 "%s%s%s:%u:
%s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7fe7
It is the IMMD that is crashing causing the messages to become pending.
I am attaching coredump and immnd and immd trace files from SC-1 where 7 nodes
join one by one. When PL-8 joins; the IMMD coredumps.
The code used was changeset 5828:df7bef2079b1 + change of
IMMSV_DEFAULT_FEVS_MAX_PENDING to
Instead of blindly changing other configuration parameters, please first try to
find out what the PROBLEM is.
Go back to OpensAF defaults on all settings, except IMMSV_FEVS_MAX_PENDING
which you had
increased to 255 (the maximum possible).
You said you had "managed to overcome the perormance iss
Some time back ,I bought-up 30 Nodes with TCP transport with out any issue, at
that time In addition to increasing Larger MDS
buffers(MDS_SOCK_SND_RCV_BUF_SIZE & DTM_SOCK_SND_RCV_BUF_SIZE), I also
increased wmem_max & rmem_max, you also give a try.
sysctl -w net.core.wmem_max=33554432
sysctl -
Well a hint is that you managed to bypass the problem (temporarily) by
increasing a queue size.
The error:
Sep 6 6:58:02.096641 osafimmnd [502:ImmModel.cc:1366] T2 ERR_TRY_AGAIN: Too
many pending incoming fevs messages (> 16) rejecting sync iteration next request
Is very rarely seen, but can hap
I don't think it is performance problems.
There is nothing indicating CPU load; memory; nor IO bandwith.
This is just a simple node joining seem to trigger some "logical" bug.
There is no application; but just pure opensaf.
I am now trying to elaborate with different MDS configuration options and
20 matches
Mail list logo