[Desktop-packages] [Bug 1634513] [NEW] Postgres cannot startup after crashing

Pavel Tue, 18 Oct 2016 06:51:19 -0700

Public bug reported:

Ubuntu 15.10


Postgresql 9.5+175.pgdg15.10+1

postgresql-common 175.pgdg15.10+1


# How to reproduce

Execute 'echo b > /proc/sysrq-trigger' during postgres workload

After machine restart, systemd try to start cluster through
pg_ctlcluster and failed

Log messages:

2016-10-18 15:22:50 MSK [5513-1] LOG:  database system was interrupted; last 
known up at: 2016-10-18 15:08:50 MSK
2016-10-18 15:22:50 MSK [5513-2] LOG:  database system was not properly shut 
down; automatic recovery in progress2016-10-18 15:22:50 MSK [5513-3] LOG:  redo 
starts at A/ED186BA0
2016-10-18 15:22:50 MSK [5530-1] [н/д]@[н/д] LOG:  incomplete startup packet
2016-10-18 15:22:51 MSK [5547-1] postgres@postgres FATAL:  the database system 
is starting up
2016-10-18 15:22:51 MSK [5550-1] postgres@postgres FATAL:  the database system 
is starting up
2016-10-18 15:22:52 MSK [5553-1] postgres@postgres FATAL:  the database system 
is starting up
2016-10-18 15:22:52 MSK [5556-1] postgres@postgres FATAL:  the database system 
is starting up
2016-10-18 15:22:53 MSK [5559-1] postgres@postgres FATAL:  the database system 
is starting up
2016-10-18 15:22:53 MSK [5562-1] postgres@postgres FATAL:  the database system 
is starting up
2016-10-18 15:22:54 MSK [5565-1] postgres@postgres FATAL:  the database system 
is starting up
2016-10-18 15:22:54 MSK [5570-1] postgres@postgres FATAL:  the database system 
is starting up
2016-10-18 15:22:55 MSK [5573-1] postgres@postgres FATAL:  the database system 
is starting up
2016-10-18 15:22:55 MSK [5576-1] postgres@postgres FATAL:  the database system 
is starting up
2016-10-18 15:22:56 MSK [5579-1] postgres@postgres FATAL:  the database system 
is starting up
2016-10-18 15:22:56 MSK [5508-1] LOG:  received smart shutdown request
2016-10-18 15:22:56 MSK [5580-1] LOG:  shutting down
2016-10-18 15:22:56 MSK [5580-2] LOG:  database system is shut down


# Why it is happens

pg_ctlcluster check cluster is running through psql

pg_ctlcluster contain func with name cluster_port_ready check:

  while ($n < ($result ? 10 : 3)) {
        select undef, undef, undef, 0.5;
        $out = `$psql -h '$sd' --port $p -l 2>&1 > /dev/null`;

        print STDERR "PSQL res: $out $?\n";

        if ($? == $result) {
            $n++;
        } else {
            $n = 0;
        }
        $result = $?;
    }

That func check error code after executing psql. Max 10 times with
interval 0.5s, so 5s is maximum time to postmaster restoring after
crashing. After that pg_ctlcluster return exit code 1 and systemd send
SIGTERM to postgres.


But postmaster cannot accept any connection during restore procedure

postmaser.c:2164
                case CAC_STARTUP:
                        ereport(FATAL,
                                        (errcode(ERRCODE_CANNOT_CONNECT_NOW),
                                         errmsg("the database system is 
starting up")));
                        break;


# How to fix

Increase timeout ?

Check message during connect: FATAL:  the database system is starting up
?

Determine state of recovery and wait when done ?

** Affects: postgresql-common (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Desktop
Packages, which is subscribed to postgresql-common in Ubuntu.
https://bugs.launchpad.net/bugs/1634513

Title:
  Postgres cannot startup after crashing

Status in postgresql-common package in Ubuntu:
  New

Bug description:
  Ubuntu 15.10

  Postgresql 9.5+175.pgdg15.10+1

  postgresql-common 175.pgdg15.10+1

  
  # How to reproduce

  Execute 'echo b > /proc/sysrq-trigger' during postgres workload

  After machine restart, systemd try to start cluster through
  pg_ctlcluster and failed

  Log messages:

  2016-10-18 15:22:50 MSK [5513-1] LOG:  database system was interrupted; last 
known up at: 2016-10-18 15:08:50 MSK
  2016-10-18 15:22:50 MSK [5513-2] LOG:  database system was not properly shut 
down; automatic recovery in progress2016-10-18 15:22:50 MSK [5513-3] LOG:  redo 
starts at A/ED186BA0
  2016-10-18 15:22:50 MSK [5530-1] [н/д]@[н/д] LOG:  incomplete startup packet
  2016-10-18 15:22:51 MSK [5547-1] postgres@postgres FATAL:  the database 
system is starting up
  2016-10-18 15:22:51 MSK [5550-1] postgres@postgres FATAL:  the database 
system is starting up
  2016-10-18 15:22:52 MSK [5553-1] postgres@postgres FATAL:  the database 
system is starting up
  2016-10-18 15:22:52 MSK [5556-1] postgres@postgres FATAL:  the database 
system is starting up
  2016-10-18 15:22:53 MSK [5559-1] postgres@postgres FATAL:  the database 
system is starting up
  2016-10-18 15:22:53 MSK [5562-1] postgres@postgres FATAL:  the database 
system is starting up
  2016-10-18 15:22:54 MSK [5565-1] postgres@postgres FATAL:  the database 
system is starting up
  2016-10-18 15:22:54 MSK [5570-1] postgres@postgres FATAL:  the database 
system is starting up
  2016-10-18 15:22:55 MSK [5573-1] postgres@postgres FATAL:  the database 
system is starting up
  2016-10-18 15:22:55 MSK [5576-1] postgres@postgres FATAL:  the database 
system is starting up
  2016-10-18 15:22:56 MSK [5579-1] postgres@postgres FATAL:  the database 
system is starting up
  2016-10-18 15:22:56 MSK [5508-1] LOG:  received smart shutdown request
  2016-10-18 15:22:56 MSK [5580-1] LOG:  shutting down
  2016-10-18 15:22:56 MSK [5580-2] LOG:  database system is shut down


  # Why it is happens

  pg_ctlcluster check cluster is running through psql

  pg_ctlcluster contain func with name cluster_port_ready check:

    while ($n < ($result ? 10 : 3)) {
          select undef, undef, undef, 0.5;
          $out = `$psql -h '$sd' --port $p -l 2>&1 > /dev/null`;

          print STDERR "PSQL res: $out $?\n";

          if ($? == $result) {
              $n++;
          } else {
              $n = 0;
          }
          $result = $?;
      }

  That func check error code after executing psql. Max 10 times with
  interval 0.5s, so 5s is maximum time to postmaster restoring after
  crashing. After that pg_ctlcluster return exit code 1 and systemd send
  SIGTERM to postgres.

  
  But postmaster cannot accept any connection during restore procedure

  postmaser.c:2164
                  case CAC_STARTUP:
                          ereport(FATAL,
                                          (errcode(ERRCODE_CANNOT_CONNECT_NOW),
                                           errmsg("the database system is 
starting up")));
                          break;

  
  # How to fix

  Increase timeout ?

  Check message during connect: FATAL:  the database system is starting
  up ?

  Determine state of recovery and wait when done ?

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/postgresql-common/+bug/1634513/+subscriptions

-- 
Mailing list: https://launchpad.net/~desktop-packages
Post to     : desktop-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~desktop-packages
More help   : https://help.launchpad.net/ListHelp

[Desktop-packages] [Bug 1634513] [NEW] Postgres cannot startup after crashing

Reply via email to