On February 20, 2020 12:49:43 PM GMT+02:00, Maverick <m...@sapo.pt> wrote: > >> You really need to debug the start & stop of tthe resource . >> >> Please try the debug procedure and provide the output: >> https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures >> >> Best Regards, >> Strahil Nikolov > > >Hi, > >Correct me if i'm wrong, but i think that procedure doesn't work for >systemd class resources, i don't know which OCF script is responsible >for handling systemd class resources. > >Also crm command doesn't exist in RHEL/Fedora, i've seen the crm >command >only in SUSE. > > > >On 19/02/2020 19:23, Strahil Nikolov wrote: >> On February 19, 2020 7:21:12 PM GMT+02:00, Maverick <m...@sapo.pt> >wrote: >>> How is it possible that pacemaker is reporting that takes 4.2 >minutes >>> (254930ms) to execute the start of httpd systemd unit? >>> >>> Feb 19 17:04:09 boss1 pacemaker-execd [1514] (log_execute) >>> info: >>> executing - rsc:apache action:start call_id:25 >>> Feb 19 17:04:09 boss1 pacemaker-execd [1514] (systemd_unit_exec) >>> >>> debug: Performing asynchronous start op on systemd unit httpd named >>> 'apache' >>> Feb 19 17:04:09 boss1 pacemaker-execd [1514] >>> (systemd_unit_exec_with_unit) debug: Calling StartUnit for >apache: >>> /org/freedesktop/systemd1/unit/httpd_2eservice >>> Feb 19 17:04:10 boss1 pacemaker-execd [1514] (action_complete) > >>> notice: Giving up on apache start (rc=0): timeout (elapsed=254930ms, >>> remaining=-154930ms) >>> Feb 19 17:04:10 boss1 pacemaker-execd [1514] (log_finished) >>> debug: finished - rsc:apache action:monitor call_id:25 >exit-code:198 >>> exec-time:254935ms queue-time:235ms >>> >>> >>> Starting manually works fine and fast: >>> >>> # time systemctl start httpd >>> real 0m0.144s >>> user 0m0.005s >>> sys 0m0.008s >>> >>> >>> On 17/02/2020 22:47, Mvrk wrote: >>>> In attachment the pacemaker.log. On the log i can see that the >>> cluster >>>> tries to start, the start fails, then tries to stop, and the stop >>> also >>>> fails also. >>>> >>>> One more thing, my cluster was working fine on Fedora 28, i started >>>> having this problem after upgrade to Fedora 31. >>>> >>>> On 17/02/2020 21:30, Ricardo Esteves wrote: >>>>> Hi, >>>>> >>>>> Yes, i also don't understand why is trying to stop them first. >>>>> >>>>> SELinux is disabled: >>>>> >>>>> # getenforce >>>>> Disabled >>>>> >>>>> All systemd services controlled by the cluster are disabled from >>>>> starting at boot: >>>>> >>>>> # systemctl is-enabled httpd >>>>> disabled >>>>> >>>>> # systemctl is-enabled openvpn-server@01-server >>>>> disabled >>>>> >>>>> >>>>> On 17/02/2020 20:28, Ken Gaillot wrote: >>>>>> On Mon, 2020-02-17 at 17:35 +0000, Maverick wrote: >>>>>>> Hi, >>>>>>> >>>>>>> When i start my cluster, most of my systemd resources won't >start: >>>>>>> >>>>>>> Failed Resource Actions: >>>>>>> * apache_stop_0 on boss1 'OCF_TIMEOUT' (198): call=82, >>>>>>> status='Timed Out', exitreason='', last-rc-change='1970-01-01 >>>>>>> 01:00:54 +01:00', queued=29ms, exec=197799ms >>>>>>> * openvpn_stop_0 on boss1 'OCF_TIMEOUT' (198): call=61, >>>>>>> status='Timed Out', exitreason='', last-rc-change='1970-01-01 >>>>>>> 01:00:54 +01:00', queued=1805ms, exec=198841ms >>>>>> These show that attempts to stop failed, rather than start. >>>>>> >>>>>>> So everytime i reboot my node, i need to start the resources >>> manually >>>>>>> using systemd, for example: >>>>>>> >>>>>>> systemd start apache >>>>>>> >>>>>>> and then pcs resource cleanup >>>>>>> >>>>>>> Resources configuration: >>>>>>> >>>>>>> Clone: apache-clone >>>>>>> Meta Attrs: maintenance=false >>>>>>> Resource: apache (class=systemd type=httpd) >>>>>>> Meta Attrs: maintenance=false >>>>>>> Operations: monitor interval=60 timeout=100 (apache-monitor- >>>>>>> interval-60) >>>>>>> start interval=0s timeout=100 >>> (apache-start-interval- >>>>>>> 0s) >>>>>>> stop interval=0s timeout=100 >>> (apache-stop-interval-0s) >>>>>>> >>>>>>> >>>>>>> Resource: openvpn (class=systemd type=openvpn-server@01-server) >>>>>>> Meta Attrs: maintenance=false >>>>>>> Operations: monitor interval=60 timeout=100 (openvpn-monitor- >>>>>>> interval-60) >>>>>>> start interval=0s timeout=100 >>> (openvpn-start-interval- >>>>>>> 0s) >>>>>>> stop interval=0s timeout=100 >>> (openvpn-stop-interval- >>>>>>> 0s) >>>>>>> >>>>>>> >>>>>>> >>>>>>> Btw, if i try a debug-start / debug-stop the mentioned resources >>>>>>> start and stop ok. >>>>>> Based on that, my first guess would be SELinux. Check the SELinux >>> logs >>>>>> for denials. >>>>>> >>>>>> Also, make sure your systemd services are not enabled in systemd >>> itself >>>>>> (e.g. via systemctl enable). Clustered systemd services should be >>>>>> managed by the cluster only. >>> _______________________________________________ >>> Manage your subscription: >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> ClusterLabs home: https://www.clusterlabs.org/ >> You really need to debug the start & stop of tthe resource . >> >> Please try the debug procedure and provide the output: >> https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures >> >> Best Regards, >> Strahil Nikolov
Hi Maverick, you can replace 'crm resource stop' with 'pcs resource disable'. The rest is working, but sadly not for systemd. You can try to: 'pcs resource debug-start <resource> --full' Another approach is to: 1. Copy service to /etc/systemd/system 2. In '[service]' section add this: Environment=SYSTEMD_LOG_LEVEL=debug 3. Reload systemd: systemctl daemon_reload Note: I assume you got downtime for debugging the issue 4. Use 'debug-start --full' Note: Don't forget to remove the debug, or your journal will get full. Best Regards, Strahil Nikolov _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/