Hi, Can you tell me exactly which log you need. I’ll provide you as soon as possible.
Regarding some settings, I am not the original author of this cluster. People created it left the company I am working with and I inerithed the code and sometime I do not know why some settings are used. The old versions of pacemaker, corosync, crash and resource agents were compiled and installed. I simply downloaded the new versions compiled and installed them. I didn’t get any compliant during ./configure that usually checks for library compatibility. To be honest I do not know if this is the right approach. Should I “make unistall" old versions before installing the new one? Which is the suggested approach? Thank in advance for your help. > On 22 Jun 2018, at 11:30, Christine Caulfield <ccaul...@redhat.com> wrote: > > On 22/06/18 10:14, Salvatore D'angelo wrote: >> Hi Christine, >> >> Thanks for reply. Let me add few details. When I run the corosync >> service I se the corosync process running. If I stop it and run: >> >> corosync -f >> >> I see three warnings: >> warning [MAIN ] interface section bindnetaddr is used together with >> nodelist. Nodelist one is going to be used. >> warning [MAIN ] Please migrate config file to nodelist. >> warning [MAIN ] Could not set SCHED_RR at priority 99: Operation not >> permitted (1) >> warning [MAIN ] Could not set priority -2147483648: Permission denied (13) >> >> but I see node joined. >> > > Those certainly need fixing but are probably not the cause. Also why do > you have these values below set? > > max_network_delay: 100 > retransmits_before_loss_const: 25 > window_size: 150 > > I'm not saying they are causing the trouble, but they aren't going to > help keep a stable cluster. > > Without more logs (full logs are always better than just the bits you > think are meaningful) I still can't be sure. it could easily be just > that you've overwritten a packaged version of corosync with your own > compiled one and they have different configure options or that the > libraries now don't match. > > Chrissie > > >> My corosync.conf file is below. >> >> With service corosync up and running I have the following output: >> *corosync-cfgtool -s* >> Printing ring status. >> Local node ID 1 >> RING ID 0 >> id= 10.0.0.11 >> status= ring 0 active with no faults >> RING ID 1 >> id= 192.168.0.11 >> status= ring 1 active with no faults >> >> *corosync-cmapctl | grep members* >> runtime.totem.pg.mrp.srp.*members*.1.config_version (u64) = 0 >> runtime.totem.pg.mrp.srp.*members*.1.ip (str) = r(0) ip(10.0.0.11) r(1) >> ip(192.168.0.11) >> runtime.totem.pg.mrp.srp.*members*.1.join_count (u32) = 1 >> runtime.totem.pg.mrp.srp.*members*.1.status (str) = joined >> runtime.totem.pg.mrp.srp.*members*.2.config_version (u64) = 0 >> runtime.totem.pg.mrp.srp.*members*.2.ip (str) = r(0) ip(10.0.0.12) r(1) >> ip(192.168.0.12) >> runtime.totem.pg.mrp.srp.*members*.2.join_count (u32) = 1 >> runtime.totem.pg.mrp.srp.*members*.2.status (str) = joined >> >> For the moment I have two nodes in my cluster (third node and some >> issues and at the moment I did crm node standby on it). >> >> Here the dependency I have installed for corosync (that works fine with >> pacemaker 1.1.14 and corosync 2.3.5): >> libnspr4-dev_2%253a4.10.10-0ubuntu0.14.04.1_amd64.deb >> libnspr4_2%253a4.10.10-0ubuntu0.14.04.1_amd64.deb >> libnss3-dev_2%253a3.19.2.1-0ubuntu0.14.04.2_amd64.deb >> libnss3-nssdb_2%253a3.19.2.1-0ubuntu0.14.04.2_all.deb >> libnss3_2%253a3.19.2.1-0ubuntu0.14.04.2_amd64.deb >> libqb-dev_0.16.0.real-1ubuntu4_amd64.deb >> libqb0_0.16.0.real-1ubuntu4_amd64.deb >> >> *corosync.conf* >> --------------------- >> quorum { >> provider: corosync_votequorum >> expected_votes: 3 >> } >> totem { >> version: 2 >> crypto_cipher: none >> crypto_hash: none >> rrp_mode: passive >> interface { >> ringnumber: 0 >> bindnetaddr: 10.0.0.0 >> mcastport: 5405 >> ttl: 1 >> } >> interface { >> ringnumber: 1 >> bindnetaddr: 192.168.0.0 >> mcastport: 5405 >> ttl: 1 >> } >> transport: udpu >> max_network_delay: 100 >> retransmits_before_loss_const: 25 >> window_size: 150 >> } >> nodelist { >> node { >> ring0_addr: pg1 >> ring1_addr: pg1p >> nodeid: 1 >> } >> node { >> ring0_addr: pg2 >> ring1_addr: pg2p >> nodeid: 2 >> } >> node { >> ring0_addr: pg3 >> ring1_addr: pg3p >> nodeid: 3 >> } >> } >> logging { >> to_syslog: yes >> } >> >> >> >> >>> On 22 Jun 2018, at 09:24, Christine Caulfield <ccaul...@redhat.com >>> <mailto:ccaul...@redhat.com>> wrote: >>> >>> On 21/06/18 16:16, Salvatore D'angelo wrote: >>>> Hi, >>>> >>>> I upgraded my PostgreSQL/Pacemaker cluster with these versions. >>>> Pacemaker 1.1.14 -> 1.1.18 >>>> Corosync 2.3.5 -> 2.4.4 >>>> Crmsh 2.2.0 -> 3.0.1 >>>> Resource agents 3.9.7 -> 4.1.1 >>>> >>>> I started on a first node (I am trying one node at a time upgrade). >>>> On a PostgreSQL slave node I did: >>>> >>>> *crm node standby <node>* >>>> *service pacemaker stop* >>>> *service corosync stop* >>>> >>>> Then I build the tool above as described on their GitHub.com >>>> <http://GitHub.com> >>>> <http://GitHub.com <http://github.com/>> page. >>>> >>>> *./autogen.sh (where required)* >>>> *./configure* >>>> *make (where required)* >>>> *make install* >>>> >>>> Everything went ok. I expect new file overwrite old one. I left the >>>> dependency I had with old software because I noticed the .configure >>>> didn’t complain. >>>> I started corosync. >>>> >>>> *service corosync start* >>>> >>>> To verify corosync work properly I used the following commands: >>>> *corosync-cfg-tool -s* >>>> *corosync-cmapctl | grep members* >>>> >>>> Everything seemed ok and I verified my node joined the cluster (at least >>>> this is my impression). >>>> >>>> Here I verified a problem. Doing the command: >>>> corosync-quorumtool -ps >>>> >>>> I got the following problem: >>>> Cannot initialise CFG service >>>> >>> That says that corosync is not running. Have a look in the log files to >>> see why it stopped. The pacemaker logs below are showing the same thing, >>> but we can't make any more guesses until we see what corosync itself is >>> doing. Enabling debug in corosync.conf will also help if more detail is >>> needed. >>> >>> Also starting corosync with 'corosync -pf' on the command-line is often >>> a quick way of checking things are starting OK. >>> >>> Chrissie >>> >>> >>>> If I try to start pacemaker, I only see pacemaker process running and >>>> pacemaker.log containing the following lines: >>>> >>>> /Jun 21 15:09:38 [17115] pg1 pacemakerd: info: crm_log_init:Changed >>>> active directory to /var/lib/pacemaker/cores/ >>>> /Jun 21 15:09:38 [17115] pg1 pacemakerd: info: >>>> get_cluster_type:Detected an active 'corosync' cluster/ >>>> /Jun 21 15:09:38 [17115] pg1 pacemakerd: info: >>>> mcp_read_config:Reading configure for stack: corosync/ >>>> /Jun 21 15:09:38 [17115] pg1 pacemakerd: notice: main:Starting >>>> Pacemaker 1.1.18 | build=2b07d5c5a9 features: libqb-logging libqb-ipc >>>> lha-fencing nagios corosync-native atomic-attrd acls/ >>>> /Jun 21 15:09:38 [17115] pg1 pacemakerd: info: main:Maximum core >>>> file size is: 18446744073709551615/ >>>> /Jun 21 15:09:38 [17115] pg1 pacemakerd: info: >>>> qb_ipcs_us_publish:server name: pacemakerd/ >>>> /Jun 21 15:09:53 [17115] pg1 pacemakerd: warning: >>>> corosync_node_name:Could not connect to Cluster Configuration Database >>>> API, error CS_ERR_TRY_AGAIN/ >>>> /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: >>>> corosync_node_name:Unable to get node name for nodeid 1/ >>>> /Jun 21 15:09:53 [17115] pg1 pacemakerd: notice: get_node_name:Could >>>> not obtain a node name for corosync nodeid 1/ >>>> /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_get_peer:Created >>>> entry 1aeef8ac-643b-44f7-8ce3-d82bbf40bbc1/0x557dc7f05d30 for node >>>> (null)/1 (1 total)/ >>>> /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_get_peer:Node 1 >>>> has uuid 1/ >>>> /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: >>>> crm_update_peer_proc:cluster_connect_cpg: Node (null)[1] - corosync-cpg >>>> is now online/ >>>> /Jun 21 15:09:53 [17115] pg1 pacemakerd: error: >>>> cluster_connect_quorum:Could not connect to the Quorum API: 2/ >>>> /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: >>>> qb_ipcs_us_withdraw:withdrawing server sockets/ >>>> /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: main:Exiting >>>> pacemakerd/ >>>> /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: >>>> crm_xml_cleanup:Cleaning up memory from libxml2/ >>>> >>>> *What is wrong in my procedure?* >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>> >>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>>> >>> >>> _______________________________________________ >>> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >> >> >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org