Thanks for your response Dejan. I do not know yet whether this has anything to do with endianness. FWIW, there could be something quirky with the system so keeping all options open. :)
I added some debug prints to understand what's happening under the hood. *Success case: (on x86 machine): * [TOTEM ] entering OPERATIONAL state. [TOTEM ] A new membership (10.206.1.7:137220) was formed. Members joined: 181272839 [TOTEM ] Nikhil: Inside messages_deliver_to_app. end_point=0, my_high_delivered=0 [TOTEM ] Nikhil: Inside messages_deliver_to_app. end_point=1, my_high_delivered=0 [TOTEM ] Delivering 0 to 1 [TOTEM ] Delivering MCAST message with seq 1 to pending delivery queue [SYNC ] Nikhil: Inside sync_deliver_fn. header->id=1 [TOTEM ] Nikhil: Inside messages_deliver_to_app. end_point=2, my_high_delivered=1 [TOTEM ] Delivering 1 to 2 [TOTEM ] Delivering MCAST message with seq 2 to pending delivery queue [SYNC ] Nikhil: Inside sync_deliver_fn. header->id=0 [SYNC ] Nikhil: Entering sync_barrier_handler [SYNC ] Committing synchronization for corosync configuration map access . [TOTEM ] Delivering 2 to 4 [TOTEM ] Delivering MCAST message with seq 3 to pending delivery queue [TOTEM ] Delivering MCAST message with seq 4 to pending delivery queue [CPG ] comparing: sender r(0) ip(10.206.1.7) ; members(old:0 left:0) [CPG ] chosen downlist: sender r(0) ip(10.206.1.7) ; members(old:0 left:0) [SYNC ] Committing synchronization for corosync cluster closed process group service v1.01 *[MAIN ] Completed service synchronization, ready to provide service.* *Failure case: (on ppc)*: [TOTEM ] entering OPERATIONAL state. [TOTEM ] A new membership (10.207.24.101:16) was formed. Members joined: 181344357 [TOTEM ] Nikhil: Inside messages_deliver_to_app. end_point=0, my_high_delivered=0 [TOTEM ] Nikhil: Inside messages_deliver_to_app. end_point=1, my_high_delivered=0 [TOTEM ] Delivering 0 to 1 [TOTEM ] Delivering MCAST message with seq 1 to pending delivery queue [SYNC ] Nikhil: Inside sync_deliver_fn header->id=1 [TOTEM ] Nikhil: Inside messages_deliver_to_app. end_point=1, my_high_delivered=1 [TOTEM ] Nikhil: Inside messages_deliver_to_app. end_point=1, my_high_delivered=1 Above message repeats continuously. So it appears that in failure case I do not receive messages with sequence number 2-4. If somebody can throw some ideas that'll help a lot. -Thanks Nikhil On Tue, May 3, 2016 at 5:26 PM, Dejan Muhamedagic <deja...@fastmail.fm> wrote: > Hi, > > On Mon, May 02, 2016 at 08:54:09AM +0200, Jan Friesse wrote: > > >As your hardware is probably capable of running ppcle and if you have an > > >environment > > >at hand without too much effort it might pay off to try that. > > >There are of course distributions out there support corosync on > > >big-endian architectures > > >but I don't know if there is an automatized regression for corosync on > > >big-endian that > > >would catch big-endian-issues right away with something as current as > > >your 2.3.5. > > > > No we are not testing big-endian. > > > > So totally agree with Klaus. Give a try to ppcle. Also make sure all > > nodes are little-endian. Corosync should work in mixed BE/LE > > environment but because it's not tested, it may not work (and it's a > > bug, so if ppcle works I will try to fix BE). > > I tested a cluster consisting of big endian/little endian nodes > (s390 and x86-64), but that was a while ago. IIRC, all relevant > bugs in corosync got fixed at that time. Don't know what is the > situation with the latest version. > > Thanks, > > Dejan > > > Regards, > > Honza > > > > > > > >Regards, > > >Klaus > > > > > >On 05/02/2016 06:44 AM, Nikhil Utane wrote: > > >>Re-sending as I don't see my post on the thread. > > >> > > >>On Sun, May 1, 2016 at 4:21 PM, Nikhil Utane > > >><nikhil.subscri...@gmail.com <mailto:nikhil.subscri...@gmail.com>> > wrote: > > >> > > >> Hi, > > >> > > >> Looking for some guidance here as we are completely blocked > > >> otherwise :(. > > >> > > >> -Regards > > >> Nikhil > > >> > > >> On Fri, Apr 29, 2016 at 6:11 PM, Sriram <sriram...@gmail.com > > >> <mailto:sriram...@gmail.com>> wrote: > > >> > > >> Corrected the subject. > > >> > > >> We went ahead and captured corosync debug logs for our ppc > board. > > >> After log analysis and comparison with the sucessful logs( > > >> from x86 machine) , > > >> we didnt find *"[ MAIN ] Completed service synchronization, > > >> ready to provide service.*" in ppc logs. > > >> So, looks like corosync is not in a position to accept > > >> connection from Pacemaker. > > >> Even I tried with the new corosync.conf with no success. > > >> > > >> Any hints on this issue would be really helpful. > > >> > > >> Attaching ppc_notworking.log, x86_working.log, corosync.conf. > > >> > > >> Regards, > > >> Sriram > > >> > > >> > > >> > > >> On Fri, Apr 29, 2016 at 2:44 PM, Sriram <sriram...@gmail.com > > >> <mailto:sriram...@gmail.com>> wrote: > > >> > > >> Hi, > > >> > > >> I went ahead and made some changes in file system(Like I > > >> brought in /etc/init.d/corosync and /etc/init.d/pacemaker, > > >> /etc/sysconfig ), After that I was able to run "pcs > > >> cluster start". > > >> But it failed with the following error > > >> # pcs cluster start > > >> Starting Cluster... > > >> Starting Pacemaker Cluster Manager[FAILED] > > >> Error: unable to start pacemaker > > >> > > >> And in the /var/log/pacemaker.log, I saw these errors > > >> pacemakerd: info: mcp_read_config: cmap connection > > >> setup failed: CS_ERR_TRY_AGAIN. Retrying in 4s > > >> Apr 29 08:53:47 [15863] node_cu pacemakerd: info: > > >> mcp_read_config: cmap connection setup failed: > > >> CS_ERR_TRY_AGAIN. Retrying in 5s > > >> Apr 29 08:53:52 [15863] node_cu pacemakerd: warning: > > >> mcp_read_config: Could not connect to Cluster > > >> Configuration Database API, error 6 > > >> Apr 29 08:53:52 [15863] node_cu pacemakerd: notice: > > >> main: Could not obtain corosync config data, exiting > > >> Apr 29 08:53:52 [15863] node_cu pacemakerd: info: > > >> crm_xml_cleanup: Cleaning up memory from libxml2 > > >> > > >> > > >> And in the /var/log/Debuglog, I saw these errors coming > > >> from corosync > > >> 20160429 085347.487050 <tel:085347.487050> airv_cu > > >> daemon.warn corosync[12857]: [QB ] Denied connection, > > >> is not ready (12857-15863-14) > > >> 20160429 085347.487067 <tel:085347.487067> airv_cu > > >> daemon.info <http://daemon.info> corosync[12857]: [QB > > >> ] Denied connection, is not ready (12857-15863-14) > > >> > > >> > > >> I browsed the code of libqb to find that it is failing in > > >> > > >> > https://github.com/ClusterLabs/libqb/blob/master/lib/ipc_setup.c > > >> > > >> Line 600 : > > >> handle_new_connection function > > >> > > >> Line 637: > > >> if (auth_result == 0 && > > >> c->service->serv_fns.connection_accept) { > > >> res = c->service->serv_fns.connection_accept(c, > > >> c->euid, c->egid); > > >> } > > >> if (res != 0) { > > >> goto send_response; > > >> } > > >> > > >> Any hints on this issue would be really helpful for me to > > >> go ahead. > > >> Please let me know if any logs are required, > > >> > > >> Regards, > > >> Sriram > > >> > > >> On Thu, Apr 28, 2016 at 2:42 PM, Sriram > > >> <sriram...@gmail.com <mailto:sriram...@gmail.com>> wrote: > > >> > > >> Thanks Ken and Emmanuel. > > >> Its a big endian machine. I will try with running "pcs > > >> cluster setup" and "pcs cluster start" > > >> Inside cluster.py, "service pacemaker start" and > > >> "service corosync start" are executed to bring up > > >> pacemaker and corosync. > > >> Those service scripts and the infrastructure needed to > > >> bring up the processes in the above said manner > > >> doesn't exist in my board. > > >> As it is a embedded board with the limited memory, > > >> full fledged linux is not installed. > > >> Just curious to know, what could be reason the > > >> pacemaker throws that error. > > >> > > >> /"cmap connection setup failed: CS_ERR_TRY_AGAIN. > > >> Retrying in 1s" > > >> > > >> / > > >> Thanks for response. > > >> > > >> Regards, > > >> Sriram. > > >> > > >> On Thu, Apr 28, 2016 at 8:55 AM, Ken Gaillot > > >> <kgail...@redhat.com <mailto:kgail...@redhat.com>> > wrote: > > >> > > >> On 04/27/2016 11:25 AM, emmanuel segura wrote: > > >> > you need to use pcs to do everything, pcs > > >> cluster setup and pcs > > >> > cluster start, try to use the redhat docs for > > >> more information. > > >> > > >> Agreed -- pcs cluster setup will create a proper > > >> corosync.conf for you. > > >> Your corosync.conf below uses corosync 1 syntax, > > >> and there were > > >> significant changes in corosync 2. In particular, > > >> you don't need the > > >> file created in step 4, because pacemaker is no > > >> longer launched via a > > >> corosync plugin. > > >> > > >> > 2016-04-27 17:28 GMT+02:00 Sriram > > >> <sriram...@gmail.com <mailto:sriram...@gmail.com > >>: > > >> >> Dear All, > > >> >> > > >> >> I m trying to use pacemaker and corosync for > > >> the clustering requirement that > > >> >> came up recently. > > >> >> We have cross compiled corosync, pacemaker and > > >> pcs(python) for ppc > > >> >> environment (Target board where pacemaker and > > >> corosync are supposed to run) > > >> >> I m having trouble bringing up pacemaker in > > >> that environment, though I could > > >> >> successfully bring up corosync. > > >> >> Any help is welcome. > > >> >> > > >> >> I m using these versions of pacemaker and > corosync > > >> >> [root@node_cu pacemaker]# corosync -v > > >> >> Corosync Cluster Engine, version '2.3.5' > > >> >> Copyright (c) 2006-2009 Red Hat, Inc. > > >> >> [root@node_cu pacemaker]# pacemakerd -$ > > >> >> Pacemaker 1.1.14 > > >> >> Written by Andrew Beekhof > > >> >> > > >> >> For running corosync, I did the following. > > >> >> 1. Created the following directories, > > >> >> /var/lib/pacemaker > > >> >> /var/lib/corosync > > >> >> /var/lib/pacemaker > > >> >> /var/lib/pacemaker/cores > > >> >> /var/lib/pacemaker/pengine > > >> >> /var/lib/pacemaker/blackbox > > >> >> /var/lib/pacemaker/cib > > >> >> > > >> >> > > >> >> 2. Created a file called corosync.conf under > > >> /etc/corosync folder with the > > >> >> following contents > > >> >> > > >> >> totem { > > >> >> > > >> >> version: 2 > > >> >> token: 5000 > > >> >> token_retransmits_before_loss_const: 20 > > >> >> join: 1000 > > >> >> consensus: 7500 > > >> >> vsftype: none > > >> >> max_messages: 20 > > >> >> secauth: off > > >> >> cluster_name: mycluster > > >> >> transport: udpu > > >> >> threads: 0 > > >> >> clear_node_high_bit: yes > > >> >> > > >> >> interface { > > >> >> ringnumber: 0 > > >> >> # The following three values > > >> need to be set based on your > > >> >> environment > > >> >> bindnetaddr: 10.x.x.x > > >> >> mcastaddr: 226.94.1.1 > > >> >> mcastport: 5405 > > >> >> } > > >> >> } > > >> >> > > >> >> logging { > > >> >> fileline: off > > >> >> to_syslog: yes > > >> >> to_stderr: no > > >> >> to_syslog: yes > > >> >> logfile: /var/log/corosync.log > > >> >> syslog_facility: daemon > > >> >> debug: on > > >> >> timestamp: on > > >> >> } > > >> >> > > >> >> amf { > > >> >> mode: disabled > > >> >> } > > >> >> > > >> >> quorum { > > >> >> provider: corosync_votequorum > > >> >> } > > >> >> > > >> >> nodelist { > > >> >> node { > > >> >> ring0_addr: node_cu > > >> >> nodeid: 1 > > >> >> } > > >> >> } > > >> >> > > >> >> 3. Created authkey under /etc/corosync > > >> >> > > >> >> 4. Created a file called pcmk under > > >> /etc/corosync/service.d and contents as > > >> >> below, > > >> >> cat pcmk > > >> >> service { > > >> >> # Load the Pacemaker Cluster Resource > > >> Manager > > >> >> name: pacemaker > > >> >> ver: 1 > > >> >> } > > >> >> > > >> >> 5. Added the node name "node_cu" in /etc/hosts > > >> with 10.X.X.X ip > > >> >> > > >> >> 6. ./corosync -f -p & --> this step started > > >> corosync > > >> >> > > >> >> [root@node_cu pacemaker]# netstat -alpn | grep > > >> -i coros > > >> >> udp 0 0 10.X.X.X:61841 0.0.0.0: > * > > >> >> 9133/corosync > > >> >> udp 0 0 10.X.X.X:5405 0.0.0.0: > * > > >> >> 9133/corosync > > >> >> unix 2 [ ACC ] STREAM LISTENING > > >> 148888 9133/corosync > > >> >> @quorum > > >> >> unix 2 [ ACC ] STREAM LISTENING > > >> 148884 9133/corosync > > >> >> @cmap > > >> >> unix 2 [ ACC ] STREAM LISTENING > > >> 148887 9133/corosync > > >> >> @votequorum > > >> >> unix 2 [ ACC ] STREAM LISTENING > > >> 148885 9133/corosync > > >> >> @cfg > > >> >> unix 2 [ ACC ] STREAM LISTENING > > >> 148886 9133/corosync > > >> >> @cpg > > >> >> unix 2 [ ] DGRAM > > >> 148840 9133/corosync > > >> >> > > >> >> 7. ./pacemakerd -f & gives the following error > > >> and exits. > > >> >> [root@node_cu pacemaker]# pacemakerd -f > > >> >> cmap connection setup failed: > > >> CS_ERR_TRY_AGAIN. Retrying in 1s > > >> >> cmap connection setup failed: > > >> CS_ERR_TRY_AGAIN. Retrying in 2s > > >> >> cmap connection setup failed: > > >> CS_ERR_TRY_AGAIN. Retrying in 3s > > >> >> cmap connection setup failed: > > >> CS_ERR_TRY_AGAIN. Retrying in 4s > > >> >> cmap connection setup failed: > > >> CS_ERR_TRY_AGAIN. Retrying in 5s > > >> >> Could not connect to Cluster Configuration > > >> Database API, error 6 > > >> >> > > >> >> Can you please point me, what is missing in > > >> these steps ? > > >> >> > > >> >> Before trying these steps, I tried running "pcs > > >> cluster start", but that > > >> >> command fails with "service" script not found. > > >> As the root filesystem > > >> >> doesn't contain either /etc/init.d/ or > > >> /sbin/service > > >> >> > > >> >> So, the plan is to bring up corosync and > > >> pacemaker manually, later do the > > >> >> cluster configuration using "pcs" commands. > > >> >> > > >> >> Regards, > > >> >> Sriram > > >> >> > > >> >> _______________________________________________ > > >> >> Users mailing list: Users@clusterlabs.org > > >> <mailto:Users@clusterlabs.org> > > >> >> http://clusterlabs.org/mailman/listinfo/users > > >> >> > > >> >> Project Home: http://www.clusterlabs.org > > >> >> Getting started: > > >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > >> >> Bugs: http://bugs.clusterlabs.org > > >> >> > > >> > > > >> > > > >> > > > >> > > >> > > >> _______________________________________________ > > >> Users mailing list: Users@clusterlabs.org > > >> <mailto:Users@clusterlabs.org> > > >> http://clusterlabs.org/mailman/listinfo/users > > >> > > >> Project Home: http://www.clusterlabs.org > > >> Getting started: > > >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > >> Bugs: http://bugs.clusterlabs.org > > >> > > >> > > >> > > >> > > >> > > >> _______________________________________________ > > >> Users mailing list: Users@clusterlabs.org > > >> <mailto:Users@clusterlabs.org> > > >> http://clusterlabs.org/mailman/listinfo/users > > >> > > >> Project Home: http://www.clusterlabs.org > > >> Getting started: > > >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > >> Bugs: http://bugs.clusterlabs.org > > >> > > >> > > >> > > >> > > >> > > >>_______________________________________________ > > >>Users mailing list: Users@clusterlabs.org > > >>http://clusterlabs.org/mailman/listinfo/users > > >> > > >>Project Home: http://www.clusterlabs.org > > >>Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > >>Bugs: http://bugs.clusterlabs.org > > > > > > > > >_______________________________________________ > > >Users mailing list: Users@clusterlabs.org > > >http://clusterlabs.org/mailman/listinfo/users > > > > > >Project Home: http://www.clusterlabs.org > > >Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > >Bugs: http://bugs.clusterlabs.org > > > > > > > > > _______________________________________________ > > Users mailing list: Users@clusterlabs.org > > http://clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
_______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org