Can you paste some logs?

On Thu, Jul 2, 2015 at 2:23 PM, Philippe Laflamme <phili...@hopper.com>
wrote:

> Ok, that's reasonable, but I'm not sure why it would successfully
> re-register with the master if it's not supposed to in the first place. I
> think changing the resources (for example) will dump the old configuration
> in the logs and tell you why recovery is bailing out. It's not doing that
> in this case.
>
> I looks as though this doesn't work only because the master can't ping the
> slave on the old port, because the whole recovery process was successful
> otherwise.
>
> I'm not sure if the slave could have picked up its configuration change
> and failed the recovery early, but that would definitely be a better
> experience.
>
> Philippe
>
> On Thu, Jul 2, 2015 at 5:15 PM, Vinod Kone <vinodk...@gmail.com> wrote:
>
>> For slave recovery to work, it is expected to not change its config.
>>
>> On Thu, Jul 2, 2015 at 2:10 PM, Philippe Laflamme <phili...@hopper.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I'm trying to roll out an upgrade from 0.20.0 to 0.21.0 with slaves
>>> configured with checkpointing and with "reconnect" recovery.
>>>
>>> I was investigating why the slaves would successfully re-register with
>>> the master and recover, but would subsequently be asked to shutdown
>>> ("health check timeout").
>>>
>>> It turns out that our slaves had been unintentionally configured to use
>>> port 5050 in the previous configuration. We decided to fix that during the
>>> upgrade and have them use the default 5051 port.
>>>
>>> This change seems to make the health checks fail and eventually kills
>>> the slave due to inactivity.
>>>
>>> I've confirmed that leaving the port to what it was in the previous
>>> configuration makes the slave successfully re-register and is not asked to
>>> shutdown later on.
>>>
>>> Is this a known issue? I haven't been able to find a JIRA ticket for
>>> this. Maybe it's the expected behaviour? Should I create a ticket?
>>>
>>> Thanks,
>>> Philippe
>>>
>>
>>
>

Reply via email to