On 2023-01-11 15:55, Reid Wahl wrote:
On Wed, Jan 11, 2023 at 12:48 PM Madison Kelly <mke...@alteeve.com> wrote:

On 2023-01-11 01:59, Reid Wahl wrote:
On Tue, Jan 10, 2023 at 10:14 PM Vladislav Bogdanov
<bub...@hoster-ok.com> wrote:

I suspect that valudate action is run as a non-root user.

As far as I know, both the direct command and crm_resource **should**
be running the agent as the same user, as long as Madison is running
both commands as the same user.

For what it's worth, I copied your test script to my machine (Fedora
36 using the current upstream main of Pacemaker) and it worked fine
both directly and via crm_resource. At the moment I'm not able to dig
very deeply, but I do wonder if it's either a bug that's since been
fixed, or perhaps an environment issue.

To try to rule out the former, do you have a test environment where
you can try to reproduce it on the latest Pacemaker from upstream?

I built the pacemaker source RPM from Fedora 37, then realized I'm
already running 2.1.5 on CS8, so I'm already on the latest release.
Looking at git, 2.1.5 is the latest tagged release... Are you running
newer than that?

I'm running on the current main, which contains commits that came
after the 2.1.5 release. I don't really expect this to be a Pacemaker
bug, especially with how recent your version is, but I would like to
rule that out if possible.

You would have either the src.rpm or the ./configure options you used off hand?

Madison Kelly <mke...@alteeve.com> 11 января 2023 г. 07:06:55 написал:

On 2023-01-11 00:21, Madison Kelly wrote:

On 2023-01-11 00:14, Madison Kelly wrote:

Hi all,

Edit: Last message was in HTML format, sorry about that.

     I've got a hell of a weird problem, and I am absolutely stumped on
what's going on.

     The short of it is; if my RA is called from the command line, it's
fine. If a resource exists, monitor, enable, disable, all that stuff
works just fine. If I try to create a resource, it hangs on the
validate stage. Specifically, it hangs when 'pcs' calls:

crm_resource --validate --output-as xml --class ocf --agent server
--provider alteeve --option name=<resource_name>

     Specifically, it hangs when it tries to make a shell call (to
virsh, specifically, but that doesn't matter). So to debug, I started
stripping down my RA simpler and simpler until I was left with the
very most basic of programs;

https://pastebin.com/VtSpkwMr

     That is literally the simplest program I could write that made the
shell call. The 'open()' call is where it hangs.

When I call directly;

time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server
srv04-test; echo rc:$?

====
real    0m0.061s
user    0m0.037s
sys    0m0.014s
rc:0
====

It's just fine. I can see in the log the output from the 'virsh' call
as well. However, when I call from crm_resource;

time crm_resource --validate --output-as xml --class ocf --agent
server --provider alteeve --option name=srv04-test; echo rc:$?

====
<pacemaker-result api-version="2.25" request="crm_resource --validate
--output-as xml --class ocf --agent server --provider alteeve --option
name=srv04-test">
     <resource-agent-action action="validate" class="ocf" type="server"
provider="alteeve">
       <overrides/>
       <agent-status code="1" message="error" execution_code="2"
execution_message="Timed Out" reason="Resource agent did not exit
within specified timeout"/>
     </resource-agent-action>
     <status code="1" message="Error occurred">
       <errors>
         <error>crm_resource: Error performing operation: Error
occurred</error>
       </errors>
     </status>
</pacemaker-result>

real    0m20.521s
user    0m0.022s
sys    0m0.010s
rc:1
====

In the log file, I see (from line 20 of the super-simple-test-script):

====
Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1;
/usr/bin/echo return_code:0 |]
====

Then nothing else.

The strace output is: https://pastebin.com/raw/UCEUdBeP

Environment;

* selinux is permissive
* Pacemaker 2.1.5-4.el8
* pcs 0.10.15
* 4.18.0-408.el8.x86_64
* CentOS Stream release 8

Any help is appreciated, I am stumped. :/


After sending this, I tried having my "RA" call 'hostname', and that
worked fine. I switched back to 'virsh list --all', and that hangs. So
it seems to somehow be related to call 'virsh' specifically.


OK, so more info... Knowing now that it's a problem with the virsh call
specifically (but only when validating, existing VMs monitor, enable,
disable fine, all which repeatedly call virsh), I noticed a few things.

First, I see in the logs:

====
Jan 11 00:30:43 mk-a07n02.digimer.ca libvirtd[2937]: Cannot recv data:
Connection reset by peer
====

So with this, I further simplified my test script to this:

https://pastebin.com/Ey8FdL1t

Then when I ran my test script directly, the strace output is:

Good: https://pastebin.com/Trbq67ub

When my script is called via crm_resource, the strace is this:

Bad: https://pastebin.com/jtbzHrUM

The first difference I can see happens around line 929 in the good
paste, the line "futex(0x7f48b0001ca0, FUTEX_WAKE_PRIVATE, 1) = 0"
exists, which doesn't in the bad paste. Shortly after, I start seeing:

====
line: [write(4, "\1\0\0\0\0\0\0\0", 8)         = 8]
line: [brk(NULL)                               = 0x562b7877d000]
line: [brk(0x562b787aa000)                     = 0x562b787aa000]
line: [write(4, "\1\0\0\0\0\0\0\0", 8)         = 8]
====

Around line 959 in the bad paste. There are more brk() lines, and not
long after the output stops.

--
Madison Kelly
Alteeve's Niche!
Chief Technical Officer
c: +1-647-471-0951
https://alteeve.com/

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/




--
Madison Kelly
Alteeve's Niche!
Chief Technical Officer
c: +1-647-471-0951
https://alteeve.com/




--
Madison Kelly
Alteeve's Niche!
Chief Technical Officer
c: +1-647-471-0951
https://alteeve.com/

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to