On Fri, 24 Mar 2017 at 1:03pm, Reuti wrote
Is this expected behavior? Or is something wonky with the cgroups
here? Thanks for any insights.
And the mystery deepens. After changing execd_params to turn off
"USE_CGROUPS", I tried restarting the exec daemons on the compute nodes
(just to make sure the change was propagated, which I see now from the man
page isn't necessary). However, the daemons failed to restart on some of
the nodes that aren't also admin hosts (do they have to be now?). When
the testing showed that the commands generated output now on the nodes
with restarted exec daemons, I turned "USE_CGROUPS" back on and restarted
the daemons again... and the commands *still* work. So it seems to be
restarting the daemons that "fixed" the issue, not the cgroups change.
Color me even more confused.
You can try to use `strace` to call the two applications in question,
maybe it give some hints about their behavior.
Good idea. One result of the above shenanigans is that I currently have
nodes where these commands work, and ones where they don't (because those
exec daemons never got restarted). This is the only difference that looks
relevant.
From a node that doesn't work:
fstat(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS,
0x7ffe56529040) = -1 ENOTTY (Inappropriate ioctl for device)
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2ab7c6f9d000
write(1, "cin-id2\n", 8) = 8
exit_group(0) = ?
From a node that does work:
fstat(1, {st_mode=S_IFREG|0644, st_size=45077, ...}) = 0
mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2adced199000
write(1, "qb3-id2\n", 8) = 8
exit_group(0) = ?
I'm getting progressively more confused.
--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users