Hi All,

I'm using pgsql resource agent ( resource-agents-3.9.5-9 ) on fedora20.

I'm testing various failure patterns in a pgsql replicated cluster using it.

I think if MASTER PostgreSQL process has suspended for a long time,
then the resource monitoring and demotion timed out, and the cluster cannot 
failover until resume.

-----the Cluster status after master demotion timed out.-----
Online: [ server1 server2 ]

 Master/Slave Set: msPostgresql [pgsql]
     pgsql      (ocf::heartbeat:pgsql): FAILED server2 
     Stopped: [ server1 ]
 Clone Set: ping-gw-rsc-clone [ping-gw-rsc]
     Started: [ server1 server2 ]

Node Attributes:
* Node server1:
    + master-pgsql                      : -INFINITY 
    + pgsql-data-status                 : STREAMING|SYNC
    + pgsql-status                      : STOP      
    + ping-gw1                          : 100       
* Node server2:
    + master-pgsql                      : -INFINITY 
    + pgsql-data-status                 : LATEST    
    + pgsql-status                      : PRI       
    + ping-gw1                          : 100       

Migration summary:
* Node server1: 
* Node server2: 
   pgsql: migration-threshold=1 fail-count=2 last-failure='Fri Apr 11 14:07:43 
2014'

Failed actions:
    pgsql_demote_0 on server2 'unknown error' (1): call=77, status=Timed Out, 
last-rc-change='Fri Apr 11 14:06:43 2014', queued=1ms, exec=60001ms
-------------------------------------------------------

I think pgsql_real_stop() had better throw SIGKILL to PostgreSQL when the 
shutdown(-m i) command has timed out.

What do you think abount my opinion ?

Regards,

Naoya

---
Naoya Anzai
Engineering Department
NEC Solution Inovetors, Ltd.
E-Mail: anzai-na...@mxu.nes.nec.co.jp
---


_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to