Hi, I'm using Fedora 31 (x86_64).
For apache i can use the ocf agent sure, but i have other resources for who don't exist an ocf agent, so for them i need to use systemd. All ocf and lsb type resources start ok on boot, only systemd resources have this problem. I already enabled debug for httpd and openvpn-server systemd units, but i don't see any debug on /var/log/messages or journal about any of these units. Here some of the systemd units: Apache: [Unit] Description=The Apache HTTP Server Wants=httpd-init.service After=network.target remote-fs.target nss-lookup.target httpd-init.service Documentation=man:httpd.service(8) [Service] Type=notify Environment=LANG=C Environment=SYSTEMD_LOG_LEVEL=debug ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND ExecReload=/usr/sbin/httpd $OPTIONS -k graceful # Send SIGWINCH for graceful stop KillSignal=SIGWINCH KillMode=mixed PrivateTmp=true [Install] WantedBy=multi-user.target ----------------- OpenVPN: [Unit] Description=OpenVPN service for %I After=syslog.target network-online.target Wants=network-online.target Documentation=man:openvpn(8) Documentation=https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage Documentation=https://community.openvpn.net/openvpn/wiki/HOWTO [Service] Type=notify PrivateTmp=true WorkingDirectory=/etc/openvpn/server Environment=SYSTEMD_LOG_LEVEL=debug ExecStart=/usr/sbin/openvpn --status %t/openvpn-server/status-%i.log --status-version 2 --suppress-timestamps --cipher AES-256-GCM --ncp-ciphers AES-256-GCM:AES-128-GCM:AES-256-CBC:AES-128-CBC:BF-CBC --config %i.conf CapabilityBoundingSet=CAP_IPC_LOCK CAP_NET_ADMIN CAP_NET_BIND_SERVICE CAP_NET_RAW CAP_SETGID CAP_SETUID CAP_SYS_CHROOT CAP_DAC_OVERRIDE CAP_AUDIT_WRITE LimitNPROC=10 DeviceAllow=/dev/null rw DeviceAllow=/dev/net/tun rw ProtectSystem=true ProtectHome=true KillMode=process RestartSec=5s Restart=on-failure [Install] WantedBy=multi-user.target --------------------------------- Zabbix Server: [Unit] Description=Zabbix Server with Oracle DB After=syslog.target network.target [Service] Type=simple Environment="LD_LIBRARY_PATH=/opt/oracle/lib" ExecStart=/usr/sbin/zabbix_server -f User=zabbixsrv [Install] WantedBy=multi-user.target On 20/02/2020 22:29, Strahil Nikolov wrote: > On February 20, 2020 10:29:54 PM GMT+02:00, Maverick <m...@sapo.pt> wrote: >>> Hi Maverick, >>> >>> >>> According this thread: >>> >> https://lists.clusterlabs.org/pipermail/users/2016-December/021053.html >>> You have 'startup-fencing' is set to false. >>> >>> Check it out - maybe this is your reason. >>> >>> Best Regards, >>> Strahil Nikolov >> Yes, i have stonith disabled, because as soon as the resources startup >> fail on boot, node was rebooted. >> >> >> Anyway, i was checking the pacemaker logs and the journal log, and i >> see >> that the service actually starts ok but for some reason pacemaker >> thinks >> it has timeout and then because of that tries to stop and also thinks >> it >> has timeout but actually stops it: >> >> pacemaker.log: >> >> Feb 20 19:39:52 boss1 pacemaker-execd [1499] (log_execute) info: >> executing - rsc:apache action:start call_id:25 >> Feb 20 19:39:52 boss1 pacemaker-execd [1499] (systemd_unit_exec) >> debug: Performing asynchronous start op on systemd unit httpd named >> 'apache' >> Feb 20 19:39:52 boss1 pacemaker-execd [1499] >> (systemd_unit_exec_with_unit) debug: Calling StartUnit for apache: >> /org/freedesktop/systemd1/unit/httpd_2eservice >> Feb 20 19:39:52 boss1 pacemaker-execd [1499] (action_complete) >> notice: Giving up on apache start (rc=0): timeout (elapsed=248199ms, >> remaining=-148199ms) >> Feb 20 19:39:52 boss1 pacemaker-execd [1499] (log_finished) >> debug: finished - rsc:apache action:monitor call_id:25 exit-code:198 >> exec-time:248205ms queue-time:216ms >> >> Feb 20 19:40:00 boss1 pacemaker-execd [1499] (log_execute) info: >> executing - rsc:apache action:stop call_id:81 >> Feb 20 19:40:00 boss1 pacemaker-execd [1499] (systemd_unit_exec) >> debug: Performing asynchronous stop op on systemd unit httpd named >> 'apache' >> Feb 20 19:40:00 boss1 pacemaker-execd [1499] >> (systemd_unit_exec_with_unit) debug: Calling StopUnit for apache: >> /org/freedesktop/systemd1/unit/httpd_2eservice >> Feb 20 19:40:01 boss1 pacemaker-execd [1499] (action_complete) >> notice: Giving up on apache stop (rc=0): timeout (elapsed=304539ms, >> remaining=-204539ms) >> Feb 20 19:40:01 boss1 pacemaker-execd [1499] (log_finished) >> debug: finished - rsc:apache action:monitor call_id:81 exit-code:198 >> exec-time:304545ms queue-time:240ms >> >> >> system journal: >> >> Feb 20 19:39:52 boss1 systemd[1]: Starting Cluster Controlled httpd... >> Feb 20 19:39:53 boss1 systemd[1]: Started Cluster Controlled httpd. >> Feb 20 19:39:53 boss1 httpd[2145]: Server configured, listening on: >> port >> 443, port 80 >> >> Feb 20 19:40:01 boss1 systemd[1]: Stopping The Apache HTTP Server... >> Feb 20 19:40:02 boss1 systemd[1]: httpd.service: Succeeded. >> Feb 20 19:40:02 boss1 systemd[1]: Stopped The Apache HTTP Server. >> >> >> >> >> On 20/02/2020 21:02, Strahil Nikolov wrote: >>> On February 20, 2020 9:35:07 PM GMT+02:00, Maverick <m...@sapo.pt> >> wrote: >>>> Manually it starts ok, no problems: >>>> >>>> pcs resource debug-start apache --full >>>> (unpack_config) warning: Blind faith: not fencing unseen nodes >>>> Operation start for apache (systemd::httpd) returned: 'ok' (0) >>>> >>>> >>>> On 20/02/2020 16:46, Strahil Nikolov wrote: >>>>> On February 20, 2020 12:49:43 PM GMT+02:00, Maverick <m...@sapo.pt> >>>> wrote: >>>>>>> You really need to debug the start & stop of tthe resource . >>>>>>> >>>>>>> Please try the debug procedure and provide the output: >>>>>>> https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures >>>>>>> >>>>>>> Best Regards, >>>>>>> Strahil Nikolov >>>>>> Hi, >>>>>> >>>>>> Correct me if i'm wrong, but i think that procedure doesn't work >> for >>>>>> systemd class resources, i don't know which OCF script is >>>> responsible >>>>>> for handling systemd class resources. >>>>>> >>>>>> Also crm command doesn't exist in RHEL/Fedora, i've seen the crm >>>>>> command >>>>>> only in SUSE. >>>>>> >>>>>> >>>>>> >>>>>> On 19/02/2020 19:23, Strahil Nikolov wrote: >>>>>>> On February 19, 2020 7:21:12 PM GMT+02:00, Maverick >> <m...@sapo.pt> >>>>>> wrote: >>>>>>>> How is it possible that pacemaker is reporting that takes 4.2 >>>>>> minutes >>>>>>>> (254930ms) to execute the start of httpd systemd unit? >>>>>>>> >>>>>>>> Feb 19 17:04:09 boss1 pacemaker-execd [1514] (log_execute) >> >>>>>>>> info: >>>>>>>> executing - rsc:apache action:start call_id:25 >>>>>>>> Feb 19 17:04:09 boss1 pacemaker-execd [1514] >>>> (systemd_unit_exec) >>>>>>>> >>>>>>>> debug: Performing asynchronous start op on systemd unit httpd >>>> named >>>>>>>> 'apache' >>>>>>>> Feb 19 17:04:09 boss1 pacemaker-execd [1514] >>>>>>>> (systemd_unit_exec_with_unit) debug: Calling StartUnit for >>>>>> apache: >>>>>>>> /org/freedesktop/systemd1/unit/httpd_2eservice >>>>>>>> Feb 19 17:04:10 boss1 pacemaker-execd [1514] >> (action_complete) >>>>>> >>>>>>>> notice: Giving up on apache start (rc=0): timeout >>>> (elapsed=254930ms, >>>>>>>> remaining=-154930ms) >>>>>>>> Feb 19 17:04:10 boss1 pacemaker-execd [1514] (log_finished) >>>> >>>>>>>> debug: finished - rsc:apache action:monitor call_id:25 >>>>>> exit-code:198 >>>>>>>> exec-time:254935ms queue-time:235ms >>>>>>>> >>>>>>>> >>>>>>>> Starting manually works fine and fast: >>>>>>>> >>>>>>>> # time systemctl start httpd >>>>>>>> real 0m0.144s >>>>>>>> user 0m0.005s >>>>>>>> sys 0m0.008s >>>>>>>> >>>>>>>> >>>>>>>> On 17/02/2020 22:47, Mvrk wrote: >>>>>>>>> In attachment the pacemaker.log. On the log i can see that the >>>>>>>> cluster >>>>>>>>> tries to start, the start fails, then tries to stop, and the >> stop >>>>>>>> also >>>>>>>>> fails also. >>>>>>>>> >>>>>>>>> One more thing, my cluster was working fine on Fedora 28, i >>>> started >>>>>>>>> having this problem after upgrade to Fedora 31. >>>>>>>>> >>>>>>>>> On 17/02/2020 21:30, Ricardo Esteves wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Yes, i also don't understand why is trying to stop them first. >>>>>>>>>> >>>>>>>>>> SELinux is disabled: >>>>>>>>>> >>>>>>>>>> # getenforce >>>>>>>>>> Disabled >>>>>>>>>> >>>>>>>>>> All systemd services controlled by the cluster are disabled >> from >>>>>>>>>> starting at boot: >>>>>>>>>> >>>>>>>>>> # systemctl is-enabled httpd >>>>>>>>>> disabled >>>>>>>>>> >>>>>>>>>> # systemctl is-enabled openvpn-server@01-server >>>>>>>>>> disabled >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 17/02/2020 20:28, Ken Gaillot wrote: >>>>>>>>>>> On Mon, 2020-02-17 at 17:35 +0000, Maverick wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> When i start my cluster, most of my systemd resources won't >>>>>> start: >>>>>>>>>>>> Failed Resource Actions: >>>>>>>>>>>> * apache_stop_0 on boss1 'OCF_TIMEOUT' (198): call=82, >>>>>>>>>>>> status='Timed Out', exitreason='', >> last-rc-change='1970-01-01 >>>>>>>>>>>> 01:00:54 +01:00', queued=29ms, exec=197799ms >>>>>>>>>>>> * openvpn_stop_0 on boss1 'OCF_TIMEOUT' (198): call=61, >>>>>>>>>>>> status='Timed Out', exitreason='', >> last-rc-change='1970-01-01 >>>>>>>>>>>> 01:00:54 +01:00', queued=1805ms, exec=198841ms >>>>>>>>>>> These show that attempts to stop failed, rather than start. >>>>>>>>>>> >>>>>>>>>>>> So everytime i reboot my node, i need to start the resources >>>>>>>> manually >>>>>>>>>>>> using systemd, for example: >>>>>>>>>>>> >>>>>>>>>>>> systemd start apache >>>>>>>>>>>> >>>>>>>>>>>> and then pcs resource cleanup >>>>>>>>>>>> >>>>>>>>>>>> Resources configuration: >>>>>>>>>>>> >>>>>>>>>>>> Clone: apache-clone >>>>>>>>>>>> Meta Attrs: maintenance=false >>>>>>>>>>>> Resource: apache (class=systemd type=httpd) >>>>>>>>>>>> Meta Attrs: maintenance=false >>>>>>>>>>>> Operations: monitor interval=60 timeout=100 >>>> (apache-monitor- >>>>>>>>>>>> interval-60) >>>>>>>>>>>> start interval=0s timeout=100 >>>>>>>> (apache-start-interval- >>>>>>>>>>>> 0s) >>>>>>>>>>>> stop interval=0s timeout=100 >>>>>>>> (apache-stop-interval-0s) >>>>>>>>>>>> Resource: openvpn (class=systemd >>>> type=openvpn-server@01-server) >>>>>>>>>>>> Meta Attrs: maintenance=false >>>>>>>>>>>> Operations: monitor interval=60 timeout=100 >>>> (openvpn-monitor- >>>>>>>>>>>> interval-60) >>>>>>>>>>>> start interval=0s timeout=100 >>>>>>>> (openvpn-start-interval- >>>>>>>>>>>> 0s) >>>>>>>>>>>> stop interval=0s timeout=100 >>>>>>>> (openvpn-stop-interval- >>>>>>>>>>>> 0s) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Btw, if i try a debug-start / debug-stop the mentioned >>>> resources >>>>>>>>>>>> start and stop ok. >>>>>>>>>>> Based on that, my first guess would be SELinux. Check the >>>> SELinux >>>>>>>> logs >>>>>>>>>>> for denials. >>>>>>>>>>> >>>>>>>>>>> Also, make sure your systemd services are not enabled in >>>> systemd >>>>>>>> itself >>>>>>>>>>> (e.g. via systemctl enable). Clustered systemd services >> should >>>> be >>>>>>>>>>> managed by the cluster only. >>>>>>>> _______________________________________________ >>>>>>>> Manage your subscription: >>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>>> >>>>>>>> ClusterLabs home: https://www.clusterlabs.org/ >>>>>>> You really need to debug the start & stop of tthe resource . >>>>>>> >>>>>>> Please try the debug procedure and provide the output: >>>>>>> https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures >>>>>>> >>>>>>> Best Regards, >>>>>>> Strahil Nikolov >>>>> Hi Maverick, >>>>> >>>>> >>>>> you can replace 'crm resource stop' with 'pcs resource disable'. >>>>> The rest is working, but sadly not for systemd. >>>>> >>>>> You can try to: >>>>> 'pcs resource debug-start <resource> --full' >>>>> Another approach is to: >>>>> 1. Copy service to /etc/systemd/system >>>>> 2. In '[service]' section add this: >>>>> Environment=SYSTEMD_LOG_LEVEL=debug >>>>> 3. Reload systemd: >>>>> systemctl daemon_reload >>>>> Note: I assume you got downtime for debugging the issue >>>>> 4. Use 'debug-start --full' >>>>> >>>>> Note: Don't forget to remove the debug, or your journal will get >>>> full. >>>>> Best Regards, >>>>> Strahil Nikolov >>> Hi Maverick, >>> >>> >>> According this thread: >>> >> https://lists.clusterlabs.org/pipermail/users/2016-December/021053.html >>> You have 'startup-fencing' is set to false. >>> >>> Check it out - maybe this is your reason. >>> >>> Best Regards, >>> Strahil Nikolov > Hi Maverick, > > Can you share your systemd service ? > What distribution are you using and what is the reason for using systemd > instead of the ocf resource for apache ? > > Could you enable the DEBUG for the systemd service ? > > > Best Regards, > Strahil Nikolov _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/