On Mon, Apr 24, 2017 at 04:34:07PM +0200, Jehan-Guillaume de Rorthais wrote: > Hi all, > > In the PostgreSQL Automatic Failover (PAF) project, one of most frequent > negative feedback we got is how difficult it is to experience with it because > of > fencing occurring way too frequently. I am currently hunting this kind of > useless fencing to make life easier. > > It occurs to me, a frequent reason of fencing is because during the stop > action, we check the status of the PostgreSQL instance using our monitor > function before trying to stop the resource. If the function does not return > OCF_NOT_RUNNING, OCF_SUCCESS or OCF_RUNNING_MASTER, we just raise an error, > leading to a fencing. See: > https://github.com/dalibo/PAF/blob/d50d0d783cfdf5566c3b7c8bd7ef70b11e4d1043/script/pgsqlms#L1291-L1301 > > I am considering adding a check to define if the instance is stopped even if > the > monitor action returns an error. The idea would be to parse **all** the local > processes looking for at least one pair of "/proc/<PID>/{comm,cwd}" related to > the PostgreSQL instance we want to stop. If none are found, we consider the > instance is not running. Gracefully or not, we just know it is down and we can > return OCF_SUCCESS. > > Just for completeness, the piece of code would be: > > my @pids; > foreach my $f (glob "/proc/[0-9]*") { > push @pids => basename($f) > if -r $f > and basename( readlink( "$f/exe" ) ) eq "postgres" > and readlink( "$f/cwd" ) eq $pgdata; > } > > I feels safe enough to me. The only risk I could think of is in a shared disk > cluster with multiple nodes accessing the same data in RW (such setup can > fail in so many ways :)). However, PAF is not supposed to work in such > context, > so I can live with this. > > Do you guys have some advices? Do you see some drawbacks? Hazards?
Isn't that the wrong place to "fix" it? Why did your _monitor return something "weird"? What did it return? Should you not fix it there? Just thinking out loud. Cheers, Lars _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org