Corrected the subject. We went ahead and captured corosync debug logs for our ppc board. After log analysis and comparison with the sucessful logs( from x86 machine) , we didnt find * "[ MAIN ] Completed service synchronization, ready to provide service.*" in ppc logs. So, looks like corosync is not in a position to accept connection from Pacemaker. Even I tried with the new corosync.conf with no success.
Any hints on this issue would be really helpful. Attaching ppc_notworking.log, x86_working.log, corosync.conf. Regards, Sriram On Fri, Apr 29, 2016 at 2:44 PM, Sriram <sriram...@gmail.com> wrote: > Hi, > > I went ahead and made some changes in file system(Like I brought in > /etc/init.d/corosync and /etc/init.d/pacemaker, /etc/sysconfig ), After > that I was able to run "pcs cluster start". > But it failed with the following error > # pcs cluster start > Starting Cluster... > Starting Pacemaker Cluster Manager[FAILED] > Error: unable to start pacemaker > > And in the /var/log/pacemaker.log, I saw these errors > pacemakerd: info: mcp_read_config: cmap connection setup failed: > CS_ERR_TRY_AGAIN. Retrying in 4s > Apr 29 08:53:47 [15863] node_cu pacemakerd: info: mcp_read_config: > cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 5s > Apr 29 08:53:52 [15863] node_cu pacemakerd: warning: mcp_read_config: > Could not connect to Cluster Configuration Database API, error 6 > Apr 29 08:53:52 [15863] node_cu pacemakerd: notice: main: Could not > obtain corosync config data, exiting > Apr 29 08:53:52 [15863] node_cu pacemakerd: info: crm_xml_cleanup: > Cleaning up memory from libxml2 > > > And in the /var/log/Debuglog, I saw these errors coming from corosync > 20160429 085347.487050 airv_cu daemon.warn corosync[12857]: [QB ] > Denied connection, is not ready (12857-15863-14) > 20160429 085347.487067 airv_cu daemon.info corosync[12857]: [QB ] > Denied connection, is not ready (12857-15863-14) > > > I browsed the code of libqb to find that it is failing in > > https://github.com/ClusterLabs/libqb/blob/master/lib/ipc_setup.c > > Line 600 : > handle_new_connection function > > Line 637: > if (auth_result == 0 && c->service->serv_fns.connection_accept) { > res = c->service->serv_fns.connection_accept(c, > c->euid, c->egid); > } > if (res != 0) { > goto send_response; > } > > Any hints on this issue would be really helpful for me to go ahead. > Please let me know if any logs are required, > > Regards, > Sriram > > On Thu, Apr 28, 2016 at 2:42 PM, Sriram <sriram...@gmail.com> wrote: > >> Thanks Ken and Emmanuel. >> Its a big endian machine. I will try with running "pcs cluster setup" and >> "pcs cluster start" >> Inside cluster.py, "service pacemaker start" and "service corosync start" >> are executed to bring up pacemaker and corosync. >> Those service scripts and the infrastructure needed to bring up the >> processes in the above said manner doesn't exist in my board. >> As it is a embedded board with the limited memory, full fledged linux is >> not installed. >> Just curious to know, what could be reason the pacemaker throws that >> error. >> >> >> >> *"cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 1s"* >> Thanks for response. >> >> Regards, >> Sriram. >> >> On Thu, Apr 28, 2016 at 8:55 AM, Ken Gaillot <kgail...@redhat.com> wrote: >> >>> On 04/27/2016 11:25 AM, emmanuel segura wrote: >>> > you need to use pcs to do everything, pcs cluster setup and pcs >>> > cluster start, try to use the redhat docs for more information. >>> >>> Agreed -- pcs cluster setup will create a proper corosync.conf for you. >>> Your corosync.conf below uses corosync 1 syntax, and there were >>> significant changes in corosync 2. In particular, you don't need the >>> file created in step 4, because pacemaker is no longer launched via a >>> corosync plugin. >>> >>> > 2016-04-27 17:28 GMT+02:00 Sriram <sriram...@gmail.com>: >>> >> Dear All, >>> >> >>> >> I m trying to use pacemaker and corosync for the clustering >>> requirement that >>> >> came up recently. >>> >> We have cross compiled corosync, pacemaker and pcs(python) for ppc >>> >> environment (Target board where pacemaker and corosync are supposed >>> to run) >>> >> I m having trouble bringing up pacemaker in that environment, though >>> I could >>> >> successfully bring up corosync. >>> >> Any help is welcome. >>> >> >>> >> I m using these versions of pacemaker and corosync >>> >> [root@node_cu pacemaker]# corosync -v >>> >> Corosync Cluster Engine, version '2.3.5' >>> >> Copyright (c) 2006-2009 Red Hat, Inc. >>> >> [root@node_cu pacemaker]# pacemakerd -$ >>> >> Pacemaker 1.1.14 >>> >> Written by Andrew Beekhof >>> >> >>> >> For running corosync, I did the following. >>> >> 1. Created the following directories, >>> >> /var/lib/pacemaker >>> >> /var/lib/corosync >>> >> /var/lib/pacemaker >>> >> /var/lib/pacemaker/cores >>> >> /var/lib/pacemaker/pengine >>> >> /var/lib/pacemaker/blackbox >>> >> /var/lib/pacemaker/cib >>> >> >>> >> >>> >> 2. Created a file called corosync.conf under /etc/corosync folder >>> with the >>> >> following contents >>> >> >>> >> totem { >>> >> >>> >> version: 2 >>> >> token: 5000 >>> >> token_retransmits_before_loss_const: 20 >>> >> join: 1000 >>> >> consensus: 7500 >>> >> vsftype: none >>> >> max_messages: 20 >>> >> secauth: off >>> >> cluster_name: mycluster >>> >> transport: udpu >>> >> threads: 0 >>> >> clear_node_high_bit: yes >>> >> >>> >> interface { >>> >> ringnumber: 0 >>> >> # The following three values need to be set based on >>> your >>> >> environment >>> >> bindnetaddr: 10.x.x.x >>> >> mcastaddr: 226.94.1.1 >>> >> mcastport: 5405 >>> >> } >>> >> } >>> >> >>> >> logging { >>> >> fileline: off >>> >> to_syslog: yes >>> >> to_stderr: no >>> >> to_syslog: yes >>> >> logfile: /var/log/corosync.log >>> >> syslog_facility: daemon >>> >> debug: on >>> >> timestamp: on >>> >> } >>> >> >>> >> amf { >>> >> mode: disabled >>> >> } >>> >> >>> >> quorum { >>> >> provider: corosync_votequorum >>> >> } >>> >> >>> >> nodelist { >>> >> node { >>> >> ring0_addr: node_cu >>> >> nodeid: 1 >>> >> } >>> >> } >>> >> >>> >> 3. Created authkey under /etc/corosync >>> >> >>> >> 4. Created a file called pcmk under /etc/corosync/service.d and >>> contents as >>> >> below, >>> >> cat pcmk >>> >> service { >>> >> # Load the Pacemaker Cluster Resource Manager >>> >> name: pacemaker >>> >> ver: 1 >>> >> } >>> >> >>> >> 5. Added the node name "node_cu" in /etc/hosts with 10.X.X.X ip >>> >> >>> >> 6. ./corosync -f -p & --> this step started corosync >>> >> >>> >> [root@node_cu pacemaker]# netstat -alpn | grep -i coros >>> >> udp 0 0 10.X.X.X:61841 0.0.0.0:* >>> >> 9133/corosync >>> >> udp 0 0 10.X.X.X:5405 0.0.0.0:* >>> >> 9133/corosync >>> >> unix 2 [ ACC ] STREAM LISTENING 148888 9133/corosync >>> >> @quorum >>> >> unix 2 [ ACC ] STREAM LISTENING 148884 9133/corosync >>> >> @cmap >>> >> unix 2 [ ACC ] STREAM LISTENING 148887 9133/corosync >>> >> @votequorum >>> >> unix 2 [ ACC ] STREAM LISTENING 148885 9133/corosync >>> >> @cfg >>> >> unix 2 [ ACC ] STREAM LISTENING 148886 9133/corosync >>> >> @cpg >>> >> unix 2 [ ] DGRAM 148840 9133/corosync >>> >> >>> >> 7. ./pacemakerd -f & gives the following error and exits. >>> >> [root@node_cu pacemaker]# pacemakerd -f >>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 1s >>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 2s >>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 3s >>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 4s >>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 5s >>> >> Could not connect to Cluster Configuration Database API, error 6 >>> >> >>> >> Can you please point me, what is missing in these steps ? >>> >> >>> >> Before trying these steps, I tried running "pcs cluster start", but >>> that >>> >> command fails with "service" script not found. As the root filesystem >>> >> doesn't contain either /etc/init.d/ or /sbin/service >>> >> >>> >> So, the plan is to bring up corosync and pacemaker manually, later do >>> the >>> >> cluster configuration using "pcs" commands. >>> >> >>> >> Regards, >>> >> Sriram >>> >> >>> >> _______________________________________________ >>> >> Users mailing list: Users@clusterlabs.org >>> >> http://clusterlabs.org/mailman/listinfo/users >>> >> >>> >> Project Home: http://www.clusterlabs.org >>> >> Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> >> Bugs: http://bugs.clusterlabs.org >>> >> >>> > >>> > >>> > >>> >>> >>> _______________________________________________ >>> Users mailing list: Users@clusterlabs.org >>> http://clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> >> >
ppc_notworking.log
Description: Binary data
x86_working.log
Description: Binary data
corosync.conf
Description: Binary data
_______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org