Re: [Slony1-general] Problem with replication

Lukas Fri, 02 Feb 2007 08:51:05 -0800

Hello,

 thank you Christopher for very professional answer, I believe it was
right solution, but to late, I recreated node and not it is working..


 Why slon tries to make all SYNCs in one timeout, isn't it better to make
them one by one?

thx for help
Lukas

> Lukas wrote:
>> Hello,
>>
>>  no, there was no schema changes at all. Postgres logfile does not shows
>> anything interesting, 01-24 looks like was power failure, but postgres
>> started successfully after that..
>>
>>
>> One more think, time to time I am getting from slon:
>> ERROR  remoteListenThread_1: timeout for event selection
>> that is only one error, no more errors at all..
>> I am using slon version 1.2.0
>>
>> Any ideas?
>>
>>
> Yes, that was exactly the error message I was expecting...
>
> The problem here is that the node has been disconnected for WAY WAY too
> long.
>
> When the slon managing that node connects, it tries reading thru
> sl_event to determine the list of relevant events that need to be applied.
>
> After 9-odd days, this has evidently grown to 350K events, and this
> evidently takes more than the 300 seconds that the code in
> src/slon/remote_listener.c allows for.
>
> You could, in principle, alter src/slon/remote_listener.c to change this
> time:
>
> At about line 830:
> time(&timeout);
> timeout += 300;
>
> You might change 300, which is 5 minutes, to something higher; 30000
> would doubtless be enough time to let the slon get through the query on
> sl_event.
>
> If that did work out, you'd want to set sync grouping (-g) to some
> Relatively Large Number; 10000 would probably be good...
>
> Alternatively, you'll need to treat the node as failed, and
> drop/recreate it.
>
> In future, you need to have some sort of monitoring in place so that it
> doesn't take a week to notice that the node isn't working.
>
>>> Lukas wrote:
>>>
>>>> and nothing is changing..
>>>> Table sl_event has 350 000 records, sl_log_1 has 3800 records,
>>>> sl_log_2
>>>> has 900 and sl_seqlog has 70000 recors on master side..
>>>>
>>>> Where can be the problem? What we can do?
>>>>
>>> Have you checked the slon logs for all nodes as well the PostgreSQL
>>> logs
>>> for all nodes?
>>>
>>> Using the information available to determine when this started, did you
>>> make any data model or schema changes about that time?
>>>
>
>
> --
> This message has been scanned for viruses and
> dangerous content, and is believed to be clean.
>
>



-- 
This message has been scanned for viruses and
dangerous content, and is believed to be clean.

_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general

Re: [Slony1-general] Problem with replication

Reply via email to