Hi Janne,

It's not clear whether the issue was that it was taking a long time to
restart (i.e replaying WALs) or if somehow you also ended up having to
re-replicate a bunch of tablets from host to host in the cluster. There
were some bugs in earlier versions of Kudu (eg KUDU-2125, KUDU-2020) which
could make this process rather slow to stabilize.

If this issue happens again, running 'kudu cluster ksck' during the
instable period can often yield more information to help understand what is
happening.

What version are you running?

Todd


On Wed, Nov 1, 2017 at 1:16 AM, Janne Keskitalo <janne.keskit...@paf.com>
wrote:

> Hi
>
> Our Kudu test environment got unresponsive yesterday for unknown reason.
> It has three tablet servers and one master. It's running in AWS on quite
> small host machines, so maybe some node ran out of memory or something. It
> has happened before with this setup. Anyway, after we restarted kudu
> service, we couldn't do any selects. From the tablet server UI I could see
> it was initializing and bootstrapping tablets. It took many hours until all
> tablets were in RUNNING-state.
>
> My question is where can I find information about these background
> operations? I want to understand what happens in situations when some node
> is offline and then comes back up after a while. What is tablet
> initialization and bootstrapping, etc.
>
> --
> Br.
> Janne Keskitalo,
> Database Architect, PAF.COM
> For support: dbdsupp...@paf.com
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to