Jaime Casanova wrote:
> On Wed, May 26, 2010 at 4:13 PM, "Stéphane A. Schildknecht"
> <[email protected]> wrote:
>
>> Could you check that this table is in sl_tables, and in which set it is ?
>>
>> Maybe this set isn't subscribed.
>>
>>
>
>
Those all look fine, the table exists in the replication set on both
machines, and the tab_relname matches the correct table in
pg_catalog.pg_class.
Just a little status update to the problem. The master and slave
databases still do not match, as they are missing a small chunk of data.
Yet replication is still taking place, any new data inserted into the
master ends up on the slave. After doing more looking at it, all the
data that is missing off the slave were added to the master in a certain
window of time. We're looking into what happened during that period of
time via logs and whatnot.
Our daemons are started with the -a command, and I have a copy of every
archive log from the slony slave since the point of adding that table to
replication until now. I got a list of every single ID of the rows that
are missing from the slony slave, and wrote up a little script to search
for each of those rows ID's in each of the slony archive logs. None of
them were present. So I think we can conclude that the data was not
deleted from the slave underneath slony by a user, but rather it was
never replicated to the slave in the first place.
One thing is that we had a daemon that would attempt to start the slon
daemons once every minute if they are not running already. Due to a bug,
it ended up starting a new set of daemons once every minute. This was
happening before, during, and after the chunk of data that is missing
was generated. Each minute it generated an error message saying
"duplicate key value violates unique constraint "sl_nodelock-pkey"",
which points to the daemon realizing there is already a daemon running,
and then exit. No other errors pertaining to the replicated table in
question were present in the postgres logs at this time.
At this point we will probably be removing the table from replication,
then adding it again and let it sync up.
A question: I'm still a little unfamiliar with a couple aspects of
slony, but from my understanding (correct me if I'm wrong), when adding
a table to replication, slonik modifies the table so that whenever a
insert, delete, update happens, it creates a trigger that alerts slony
of the existence of data that needs to be sent to the slave nodes. I
guess my question is, is there a way to insert data into the table and
cause that trigger effect to not be executed? And if it is possible,
could that cause the situation of "missing data" that slony itself
doesn't even know about (since it's reporting everything is in sync). If
this is possible, then I may have an situation where a user is inserting
data in an odd way that makes the inserted data not able to be replicated
Thanks,
Brian Fehrle
> or maybe the table has the wrong tab_reloid in the slave. you can
> probe that with this simple query (the same for sequences):
> select * from _cluster_name.sl_table where tab_reloid <> (tab_nspname
> || '.' || tab_relname)::regclass;
>
>
_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general