On February 20, 2020 10:29:54 PM GMT+02:00, Maverick <m...@sapo.pt> wrote: > >> Hi Maverick, >> >> >> According this thread: >> >https://lists.clusterlabs.org/pipermail/users/2016-December/021053.html >> >> You have 'startup-fencing' is set to false. >> >> Check it out - maybe this is your reason. >> >> Best Regards, >> Strahil Nikolov > >Yes, i have stonith disabled, because as soon as the resources startup >fail on boot, node was rebooted. > > >Anyway, i was checking the pacemaker logs and the journal log, and i >see >that the service actually starts ok but for some reason pacemaker >thinks >it has timeout and then because of that tries to stop and also thinks >it >has timeout but actually stops it: > >pacemaker.log: > >Feb 20 19:39:52 boss1 pacemaker-execd [1499] (log_execute) info: >executing - rsc:apache action:start call_id:25 >Feb 20 19:39:52 boss1 pacemaker-execd [1499] (systemd_unit_exec) >debug: Performing asynchronous start op on systemd unit httpd named >'apache' >Feb 20 19:39:52 boss1 pacemaker-execd [1499] >(systemd_unit_exec_with_unit) debug: Calling StartUnit for apache: >/org/freedesktop/systemd1/unit/httpd_2eservice >Feb 20 19:39:52 boss1 pacemaker-execd [1499] (action_complete) >notice: Giving up on apache start (rc=0): timeout (elapsed=248199ms, >remaining=-148199ms) >Feb 20 19:39:52 boss1 pacemaker-execd [1499] (log_finished) >debug: finished - rsc:apache action:monitor call_id:25 exit-code:198 >exec-time:248205ms queue-time:216ms > >Feb 20 19:40:00 boss1 pacemaker-execd [1499] (log_execute) info: >executing - rsc:apache action:stop call_id:81 >Feb 20 19:40:00 boss1 pacemaker-execd [1499] (systemd_unit_exec) >debug: Performing asynchronous stop op on systemd unit httpd named >'apache' >Feb 20 19:40:00 boss1 pacemaker-execd [1499] >(systemd_unit_exec_with_unit) debug: Calling StopUnit for apache: >/org/freedesktop/systemd1/unit/httpd_2eservice >Feb 20 19:40:01 boss1 pacemaker-execd [1499] (action_complete) >notice: Giving up on apache stop (rc=0): timeout (elapsed=304539ms, >remaining=-204539ms) >Feb 20 19:40:01 boss1 pacemaker-execd [1499] (log_finished) >debug: finished - rsc:apache action:monitor call_id:81 exit-code:198 >exec-time:304545ms queue-time:240ms > > >system journal: > >Feb 20 19:39:52 boss1 systemd[1]: Starting Cluster Controlled httpd... >Feb 20 19:39:53 boss1 systemd[1]: Started Cluster Controlled httpd. >Feb 20 19:39:53 boss1 httpd[2145]: Server configured, listening on: >port >443, port 80 > >Feb 20 19:40:01 boss1 systemd[1]: Stopping The Apache HTTP Server... >Feb 20 19:40:02 boss1 systemd[1]: httpd.service: Succeeded. >Feb 20 19:40:02 boss1 systemd[1]: Stopped The Apache HTTP Server. > > > > >On 20/02/2020 21:02, Strahil Nikolov wrote: >> On February 20, 2020 9:35:07 PM GMT+02:00, Maverick <m...@sapo.pt> >wrote: >>> Manually it starts ok, no problems: >>> >>> pcs resource debug-start apache --full >>> (unpack_config) warning: Blind faith: not fencing unseen nodes >>> Operation start for apache (systemd::httpd) returned: 'ok' (0) >>> >>> >>> On 20/02/2020 16:46, Strahil Nikolov wrote: >>>> On February 20, 2020 12:49:43 PM GMT+02:00, Maverick <m...@sapo.pt> >>> wrote: >>>>>> You really need to debug the start & stop of tthe resource . >>>>>> >>>>>> Please try the debug procedure and provide the output: >>>>>> https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures >>>>>> >>>>>> Best Regards, >>>>>> Strahil Nikolov >>>>> Hi, >>>>> >>>>> Correct me if i'm wrong, but i think that procedure doesn't work >for >>>>> systemd class resources, i don't know which OCF script is >>> responsible >>>>> for handling systemd class resources. >>>>> >>>>> Also crm command doesn't exist in RHEL/Fedora, i've seen the crm >>>>> command >>>>> only in SUSE. >>>>> >>>>> >>>>> >>>>> On 19/02/2020 19:23, Strahil Nikolov wrote: >>>>>> On February 19, 2020 7:21:12 PM GMT+02:00, Maverick ><m...@sapo.pt> >>>>> wrote: >>>>>>> How is it possible that pacemaker is reporting that takes 4.2 >>>>> minutes >>>>>>> (254930ms) to execute the start of httpd systemd unit? >>>>>>> >>>>>>> Feb 19 17:04:09 boss1 pacemaker-execd [1514] (log_execute) > >>>>>>> info: >>>>>>> executing - rsc:apache action:start call_id:25 >>>>>>> Feb 19 17:04:09 boss1 pacemaker-execd [1514] >>> (systemd_unit_exec) >>>>>>> >>>>>>> debug: Performing asynchronous start op on systemd unit httpd >>> named >>>>>>> 'apache' >>>>>>> Feb 19 17:04:09 boss1 pacemaker-execd [1514] >>>>>>> (systemd_unit_exec_with_unit) debug: Calling StartUnit for >>>>> apache: >>>>>>> /org/freedesktop/systemd1/unit/httpd_2eservice >>>>>>> Feb 19 17:04:10 boss1 pacemaker-execd [1514] >(action_complete) >>>>> >>>>>>> notice: Giving up on apache start (rc=0): timeout >>> (elapsed=254930ms, >>>>>>> remaining=-154930ms) >>>>>>> Feb 19 17:04:10 boss1 pacemaker-execd [1514] (log_finished) >>> >>>>>>> debug: finished - rsc:apache action:monitor call_id:25 >>>>> exit-code:198 >>>>>>> exec-time:254935ms queue-time:235ms >>>>>>> >>>>>>> >>>>>>> Starting manually works fine and fast: >>>>>>> >>>>>>> # time systemctl start httpd >>>>>>> real 0m0.144s >>>>>>> user 0m0.005s >>>>>>> sys 0m0.008s >>>>>>> >>>>>>> >>>>>>> On 17/02/2020 22:47, Mvrk wrote: >>>>>>>> In attachment the pacemaker.log. On the log i can see that the >>>>>>> cluster >>>>>>>> tries to start, the start fails, then tries to stop, and the >stop >>>>>>> also >>>>>>>> fails also. >>>>>>>> >>>>>>>> One more thing, my cluster was working fine on Fedora 28, i >>> started >>>>>>>> having this problem after upgrade to Fedora 31. >>>>>>>> >>>>>>>> On 17/02/2020 21:30, Ricardo Esteves wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Yes, i also don't understand why is trying to stop them first. >>>>>>>>> >>>>>>>>> SELinux is disabled: >>>>>>>>> >>>>>>>>> # getenforce >>>>>>>>> Disabled >>>>>>>>> >>>>>>>>> All systemd services controlled by the cluster are disabled >from >>>>>>>>> starting at boot: >>>>>>>>> >>>>>>>>> # systemctl is-enabled httpd >>>>>>>>> disabled >>>>>>>>> >>>>>>>>> # systemctl is-enabled openvpn-server@01-server >>>>>>>>> disabled >>>>>>>>> >>>>>>>>> >>>>>>>>> On 17/02/2020 20:28, Ken Gaillot wrote: >>>>>>>>>> On Mon, 2020-02-17 at 17:35 +0000, Maverick wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> When i start my cluster, most of my systemd resources won't >>>>> start: >>>>>>>>>>> Failed Resource Actions: >>>>>>>>>>> * apache_stop_0 on boss1 'OCF_TIMEOUT' (198): call=82, >>>>>>>>>>> status='Timed Out', exitreason='', >last-rc-change='1970-01-01 >>>>>>>>>>> 01:00:54 +01:00', queued=29ms, exec=197799ms >>>>>>>>>>> * openvpn_stop_0 on boss1 'OCF_TIMEOUT' (198): call=61, >>>>>>>>>>> status='Timed Out', exitreason='', >last-rc-change='1970-01-01 >>>>>>>>>>> 01:00:54 +01:00', queued=1805ms, exec=198841ms >>>>>>>>>> These show that attempts to stop failed, rather than start. >>>>>>>>>> >>>>>>>>>>> So everytime i reboot my node, i need to start the resources >>>>>>> manually >>>>>>>>>>> using systemd, for example: >>>>>>>>>>> >>>>>>>>>>> systemd start apache >>>>>>>>>>> >>>>>>>>>>> and then pcs resource cleanup >>>>>>>>>>> >>>>>>>>>>> Resources configuration: >>>>>>>>>>> >>>>>>>>>>> Clone: apache-clone >>>>>>>>>>> Meta Attrs: maintenance=false >>>>>>>>>>> Resource: apache (class=systemd type=httpd) >>>>>>>>>>> Meta Attrs: maintenance=false >>>>>>>>>>> Operations: monitor interval=60 timeout=100 >>> (apache-monitor- >>>>>>>>>>> interval-60) >>>>>>>>>>> start interval=0s timeout=100 >>>>>>> (apache-start-interval- >>>>>>>>>>> 0s) >>>>>>>>>>> stop interval=0s timeout=100 >>>>>>> (apache-stop-interval-0s) >>>>>>>>>>> Resource: openvpn (class=systemd >>> type=openvpn-server@01-server) >>>>>>>>>>> Meta Attrs: maintenance=false >>>>>>>>>>> Operations: monitor interval=60 timeout=100 >>> (openvpn-monitor- >>>>>>>>>>> interval-60) >>>>>>>>>>> start interval=0s timeout=100 >>>>>>> (openvpn-start-interval- >>>>>>>>>>> 0s) >>>>>>>>>>> stop interval=0s timeout=100 >>>>>>> (openvpn-stop-interval- >>>>>>>>>>> 0s) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Btw, if i try a debug-start / debug-stop the mentioned >>> resources >>>>>>>>>>> start and stop ok. >>>>>>>>>> Based on that, my first guess would be SELinux. Check the >>> SELinux >>>>>>> logs >>>>>>>>>> for denials. >>>>>>>>>> >>>>>>>>>> Also, make sure your systemd services are not enabled in >>> systemd >>>>>>> itself >>>>>>>>>> (e.g. via systemctl enable). Clustered systemd services >should >>> be >>>>>>>>>> managed by the cluster only. >>>>>>> _______________________________________________ >>>>>>> Manage your subscription: >>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>> >>>>>>> ClusterLabs home: https://www.clusterlabs.org/ >>>>>> You really need to debug the start & stop of tthe resource . >>>>>> >>>>>> Please try the debug procedure and provide the output: >>>>>> https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures >>>>>> >>>>>> Best Regards, >>>>>> Strahil Nikolov >>>> Hi Maverick, >>>> >>>> >>>> you can replace 'crm resource stop' with 'pcs resource disable'. >>>> The rest is working, but sadly not for systemd. >>>> >>>> You can try to: >>>> 'pcs resource debug-start <resource> --full' >>>> Another approach is to: >>>> 1. Copy service to /etc/systemd/system >>>> 2. In '[service]' section add this: >>>> Environment=SYSTEMD_LOG_LEVEL=debug >>>> 3. Reload systemd: >>>> systemctl daemon_reload >>>> Note: I assume you got downtime for debugging the issue >>>> 4. Use 'debug-start --full' >>>> >>>> Note: Don't forget to remove the debug, or your journal will get >>> full. >>>> Best Regards, >>>> Strahil Nikolov >> Hi Maverick, >> >> >> According this thread: >> >https://lists.clusterlabs.org/pipermail/users/2016-December/021053.html >> >> You have 'startup-fencing' is set to false. >> >> Check it out - maybe this is your reason. >> >> Best Regards, >> Strahil Nikolov
Hi Maverick, Can you share your systemd service ? What distribution are you using and what is the reason for using systemd instead of the ocf resource for apache ? Could you enable the DEBUG for the systemd service ? Best Regards, Strahil Nikolov _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/