Context:

We run Slony 2.0 with Postgres 8.4 on two CentOS 6 servers--one master, and one 
slave. Our database is about 30GB in size, which isn't unusual, but we do have 
a couple of tables that are more than 5GB each.

Recently, we needed to re-build our Slony cluster. I turned off Slony, restored 
identical database snapshots on the master and the slave, set up my slony.conf 
and slon_tools.conf, started the slons, ran slonik_init_cluster | slonik, then 
slonik_create_set 1 | slonik (we only have one replication set), and finally 
slonik_subscribe_set 1 2 | slonik. Everything looked good, and I was able to 
watch subscription progress in the logs.

Then the server stopped responding. I rebooted it, and saw "Kernel panic - not 
syncing: Out of memory and no killable processes" after it had killed 
everything it could.

It happens during the subscription process when our first "large" (3GB) table 
is encountered. The logs report "so and so bytes copied for table" for the 
table in question, then a few dozen queued SYNC events, and then they detect a 
child process crash and log a watchdog-initialized restart.



What I've tried:

First I blew away the database completely, re ran initdb, and then restored the 
identical snapshots again. Same kernel panic. Then I blew it away, uninstalled 
Postgres and Slony, and reinstalled them. I double-checked all of our 
memory-based settings in postgresql.conf, and they are all at stock/recommended 
levels (i.e. shared_buffers is at 1/4 of RAM etc etc). I ran a VACUUM ANALYZE 
FULL on the database before initializing the Slony cluster. As a last-ditch 
effort, I completely reinstalled the OS and all base software on the db servers 
and started from scratch. Same result every time: kernel panic, out of memory. 
It happens on the slave server, which runs both of the slons.



Question:

Why is this happening?

Our database has grown fairly linearly over the past few months (at the 
beginning of the year it was about 23GB, now it's 30), and every other time I 
have had to re-initialize the Slony cluster on these same servers, it has 
worked fine.


Zac Bentley
Systems Administrator
Corporate Reimbursement Services, Inc.
www.crsinc.com<http://www.crsinc.com/>
617-467-1949


This email message contains information that Corporate Reimbursement Services, 
Inc. considers confidential and/or proprietary, or may later designate as 
confidential and proprietary. It is intended only for use of the individual or 
entity named above and should not be forwarded to any other persons or entities 
without the express consent of Corporate Reimbursement Services, Inc., nor 
should it be used for any purpose other than in the course of any potential or 
actual business relationship with Corporate Reimbursement Services, Inc. If the 
reader of this message is not the intended recipient, or the employee or agent 
responsible to deliver it to the intended recipient, you are hereby notified 
that any dissemination, distribution, or copying of this communication is 
strictly prohibited. If you have received this communication in error, please 
notify sender immediately and destroy the original message.

Internal Revenue Service regulations require that certain types of written 
advice include a disclaimer. To the extent the preceding message contains 
advice relating to a Federal tax issue, unless expressly stated otherwise the 
advice is not intended or written to be used, and it cannot be used by the 
recipient or any other taxpayer, for the purpose of avoiding Federal tax 
penalties, and was not written to support the promotion or marketing of any 
transaction or matter discussed herein.
_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general

Reply via email to