Hi all,
Due to realizing that our 1 master -> 1 slave slony cluster had
different encodings on each box, we attempted to fix that. Our master
had encoding of LATIN1 and our slave had the encoding of SQL_ASCII (they
were initialized so long ago, we don't know who did it or why it was
done that way).
Slony worked with this setup, but we wanted to fix it, due to some
other problems, by moving the slave from SQL_ASCII to LATIN1.
So we brought down the slon daemons, brought down the slave database
and rebooted the physical machine the slave is on (dozens of cron jobs
we commented out and wanted to verify they were all dead).
When we rebooted the machine, we brought the slave postgres cluster
online and preformed a pg_dump on the entire database (including the
_slony schema). Then we brought down the postgres cluster, ran initdb to
create a new one with LATIN1 encoding, brought the new cluster online
and ran a pg_restore on it with the dump file we created before.
After that we restarted our cron jobs, which also started up the two
slon daemons, we started monitoring the slave and noticed that no
updates are being applied. We're running the slon daemons with -s 60000
(force a sync every 60 seconds) and a -x flag to get some slony logs for
log shipping. These slony logs that are generated with -x are empty
(they have the slony header and footer, but no insert data).
On the master, if I do a # select * from _slony.sl_status; I get
back that there are anywhere between 0 - 2 events, and a lag time no
greater than 3 minutes. Monitoring the slave slony log output also
verifies that events are being receved and processed without error every
minute.
Again, on the master, # select count(*) from _slony.sl_log_1;
returns with 12,000 + rows, and it continually grows. So from what I can
tell, the master is getting events qued up, but not pushing them in the
events to the slave, each event is completely void of data, and it looks
like sl_log_1 just keeps building up.
One theory is that even though we have an exact data dump of the old
slave cluster restored to to the new slave cluster, since the encoding
has changed perhaps the master doesn't recognize the slave as the same
slave it had before. If thats the case, is there any way we can get it
to recognize it without having to rebuild the slony cluster? (rebuilding
the cluster would mean a few days of work if not week/s).
Other than that, I'm unsure what to make of this. I've restarted the
daemons, and neither the master nor the slave daemon report any errors
in the logs. I verified that the triggers exist on the master as they
should (we never touched the master anyways, but still checking
everything), the path to the slave remained the same as the previous
slave (same dbname, host, port, user).
Any thoughts or things I can check would be appreciated. Or if my
theory about the master not recognizing the new slave cluster as the old
one is correct, then if we can fix that that would be great.
thanks in advance,
Brian F
_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general