Re: Question about redistributing tablets on failure of a tserver.

2017-04-12 Thread Todd Lipcon
On Wed, Apr 12, 2017 at 9:45 PM, Jason Heo  wrote:

> Hi Dan.
>
> I'm very happy to hear from you. Kudu is REALLY GREAT!
>
>
Thanks for the excitement! It's always great to hear when people are happy
with the project.


> About Q2:
>
> There are 14 tservers on my test cluster, each node has 3TB before
> re-replication and evenly distributed. Network bandwidth is 1Gbps.
>
> I have another question.
>
> Is it possible to re-replication cancel if failed tserver joins while
> re-replication goes on? re-joined tserver already has all data so I think
> re-replication is unnecessary and re-replication is a waste of time and
> resources. (This is what Elasticsearch behaves)
>

Yes, that's definitely something we'd like to do in the near future.

Right now our design is that when the leader notices a bad replica, it
ejects it from the Raft configuration, so we have a 2-node configuration.
We then immediately add a new replica and start making a tablet copy to it,
which may take some time with large tablets. During that time, if the old
node comes back, it is no longer part of the configuration and can't rejoin.

Mike Percy has started looking into changing the design to do something
more like:

- Original 3 nodes: A, B, C = VOTER
- node C dies
- add node D as a NON_VOTER/PRE_VOTER, and start the tablet copy
- if node C comes back up, remove D and cancel the tablet copy
- if node C is still not up when 'D' is available, evict C and convert D to
VOTER

Implementation isn't begun yet, but hopefully we can get this done in the
next couple of months (eg 1.4 or 1.5 release time line)

-Todd


>
> 2017-04-13 3:47 GMT+09:00 Dan Burkert :
>
>> Hi Jason, answers inline:
>>
>> On Wed, Apr 12, 2017 at 5:53 AM, Jason Heo 
>> wrote:
>>
>>>
>>> Q1. Can I disable redistributing tablets on failure of a tserver? The
>>> reason why I need this is described in Background.
>>>
>>
>> We don't have any kind of built-in maintenance mode that would prevent
>> this, but it can be achieved by setting a flag on each of the tablet
>> servers.  The goal is not to disable re-replicating tablets, but instead to
>> avoid kicking the failed replica out of the tablet groups to begin with.
>> There is a config flag to control exactly that: 'evict_failed_followers'.
>> This isn't considered a stable or supported flag, but it should have the
>> effect you are looking for, if you set it to false on each of the tablet
>> servers, by running:
>>
>> kudu tserver set-flag  evict_failed_followers false
>> --force
>>
>> for each tablet server.  When you are done, set it back to the default
>> 'true' value.  This isn't something we routinely test (especially setting
>> it without restarting the server), so please test before trying this on a
>> production cluster.
>>
>> Q2. redistribution goes on even if the failed tserver reconnected to
>>> cluster. In my test cluster, it took 2 hours to distribute when a tserver
>>> which has 3TB data was killed.
>>>
>>
>> This seems slow.  What's the speed of your network?  How many nodes?  How
>> many tablet replicas were on the failed tserver, and were the replica sizes
>> evenly balanced?  Next time this happens, you might try monitoring with
>> 'kudu ksck' to ensure there aren't additional problems in the cluster (admin 
>> guide
>> on the ksck tool
>> 
>> ).
>>
>>
>>> Q3. `--follower_unavailable_considered_failed_sec` can be changed
>>> without restarting cluster?
>>>
>>
>> The flag can be changed, but it comes with the same caveats as above:
>>
>> 'kudu tserver set-flag  
>> follower_unavailable_considered_failed_sec
>> 900 --force'
>>
>>
>> - Dan
>>
>>
>


-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Physical Tablet Data size is larger than size in Chart Library.

2017-04-12 Thread Jason Heo
Hi Dan.

Thank you for your kind reply.

My Kudu runs on CentOS 7.2 with xfs.

I'll try `kudu fs check`.

Thanks,

Jason

2017-04-13 5:47 GMT+09:00 Dan Burkert :

> Adar has told me it's fine to run the new 'kudu fs check' tool against a
> Kudu 1.2 server.  It will require building locally, though.
>
> - Dan
>
> On Wed, Apr 12, 2017 at 10:59 AM, Dan Burkert 
> wrote:
>
>> Hi Jason,
>>
>> First question: what filesystem and OS are you running?
>>
>> This has been an ongoing area of work; we fixed a few major issues in
>> 1.2, and a few more major issues in 1.3, and have a new tool ('kudu fs
>> check') that will be released in 1.4 to diagnose and fix further issues.
>> In some cases we are underestimating the true size of the data, and in some
>> cases we are keeping around data that could be cleaned up.  I've included a
>> list of relevant JIRAs below if you are interested in specifics.  It should
>> be possible to get early access to the 'kudu fs check' tool by compiling
>> Kudu locally, but I'm going to defer to Adar on that, since he's the
>> resident expert on the subject.
>>
>> KUDU-1755 
>> KUDU-1853 
>> KUDU-1856 
>> KUDU-1769 
>>
>>
>>
>>
>> On Wed, Apr 12, 2017 at 5:02 AM, Jason Heo 
>> wrote:
>>
>>> Hello.
>>>
>>> I'm using Apache Kudu 1.2 on CDH 1.2.
>>>
>>> I'm estimating how many servers needed to store my data.
>>>
>>> After loading my test data sets, total_kudu_on_disk_size_
>>> across_kudu_replicas in chart library at CDH is 27.9TB whereas sum of `du
>>> -sh /path/to/tablet_data/data` on each node is 39.9TB which is 43%
>>> bigger than chart library.
>>>
>>> I also observed the same difference on my another Kudu test cluster.
>>>
>>> I'm curious this is normal and wanted to know there is a way to reduce
>>> physical file size.
>>>
>>> Thanks,
>>>
>>> Jason.
>>>
>>>
>>>
>>>
>>>
>>
>


Re: Physical Tablet Data size is larger than size in Chart Library.

2017-04-12 Thread Dan Burkert
Adar has told me it's fine to run the new 'kudu fs check' tool against a
Kudu 1.2 server.  It will require building locally, though.

- Dan

On Wed, Apr 12, 2017 at 10:59 AM, Dan Burkert  wrote:

> Hi Jason,
>
> First question: what filesystem and OS are you running?
>
> This has been an ongoing area of work; we fixed a few major issues in 1.2,
> and a few more major issues in 1.3, and have a new tool ('kudu fs check')
> that will be released in 1.4 to diagnose and fix further issues.  In some
> cases we are underestimating the true size of the data, and in some cases
> we are keeping around data that could be cleaned up.  I've included a list
> of relevant JIRAs below if you are interested in specifics.  It should be
> possible to get early access to the 'kudu fs check' tool by compiling Kudu
> locally, but I'm going to defer to Adar on that, since he's the resident
> expert on the subject.
>
> KUDU-1755 
> KUDU-1853 
> KUDU-1856 
> KUDU-1769 
>
>
>
>
> On Wed, Apr 12, 2017 at 5:02 AM, Jason Heo 
> wrote:
>
>> Hello.
>>
>> I'm using Apache Kudu 1.2 on CDH 1.2.
>>
>> I'm estimating how many servers needed to store my data.
>>
>> After loading my test data sets, total_kudu_on_disk_size_
>> across_kudu_replicas in chart library at CDH is 27.9TB whereas sum of `du
>> -sh /path/to/tablet_data/data` on each node is 39.9TB which is 43%
>> bigger than chart library.
>>
>> I also observed the same difference on my another Kudu test cluster.
>>
>> I'm curious this is normal and wanted to know there is a way to reduce
>> physical file size.
>>
>> Thanks,
>>
>> Jason.
>>
>>
>>
>>
>>
>


Re: Question about redistributing tablets on failure of a tserver.

2017-04-12 Thread Dan Burkert
Hi Jason, answers inline:

On Wed, Apr 12, 2017 at 5:53 AM, Jason Heo  wrote:

>
> Q1. Can I disable redistributing tablets on failure of a tserver? The
> reason why I need this is described in Background.
>

We don't have any kind of built-in maintenance mode that would prevent
this, but it can be achieved by setting a flag on each of the tablet
servers.  The goal is not to disable re-replicating tablets, but instead to
avoid kicking the failed replica out of the tablet groups to begin with.
There is a config flag to control exactly that: 'evict_failed_followers'.
This isn't considered a stable or supported flag, but it should have the
effect you are looking for, if you set it to false on each of the tablet
servers, by running:

kudu tserver set-flag  evict_failed_followers false
--force

for each tablet server.  When you are done, set it back to the default
'true' value.  This isn't something we routinely test (especially setting
it without restarting the server), so please test before trying this on a
production cluster.

Q2. redistribution goes on even if the failed tserver reconnected to
> cluster. In my test cluster, it took 2 hours to distribute when a tserver
> which has 3TB data was killed.
>

This seems slow.  What's the speed of your network?  How many nodes?  How
many tablet replicas were on the failed tserver, and were the replica sizes
evenly balanced?  Next time this happens, you might try monitoring with
'kudu ksck' to ensure there aren't additional problems in the cluster
(admin guide
on the ksck tool
).


> Q3. `--follower_unavailable_considered_failed_sec` can be changed without
> restarting cluster?
>

The flag can be changed, but it comes with the same caveats as above:

'kudu tserver set-flag 
follower_unavailable_considered_failed_sec
900 --force'


- Dan


Re: Physical Tablet Data size is larger than size in Chart Library.

2017-04-12 Thread Dan Burkert
Hi Jason,

First question: what filesystem and OS are you running?

This has been an ongoing area of work; we fixed a few major issues in 1.2,
and a few more major issues in 1.3, and have a new tool ('kudu fs check')
that will be released in 1.4 to diagnose and fix further issues.  In some
cases we are underestimating the true size of the data, and in some cases
we are keeping around data that could be cleaned up.  I've included a list
of relevant JIRAs below if you are interested in specifics.  It should be
possible to get early access to the 'kudu fs check' tool by compiling Kudu
locally, but I'm going to defer to Adar on that, since he's the resident
expert on the subject.

KUDU-1755 
KUDU-1853 
KUDU-1856 
KUDU-1769 




On Wed, Apr 12, 2017 at 5:02 AM, Jason Heo  wrote:

> Hello.
>
> I'm using Apache Kudu 1.2 on CDH 1.2.
>
> I'm estimating how many servers needed to store my data.
>
> After loading my test data sets, total_kudu_on_disk_size_
> across_kudu_replicas in chart library at CDH is 27.9TB whereas sum of `du
> -sh /path/to/tablet_data/data` on each node is 39.9TB which is 43% bigger
> than chart library.
>
> I also observed the same difference on my another Kudu test cluster.
>
> I'm curious this is normal and wanted to know there is a way to reduce
> physical file size.
>
> Thanks,
>
> Jason.
>
>
>
>
>


unsubscribe

2017-04-12 Thread tesm...@gmail.com
unsubscribe


Question about redistributing tablets on failure of a tserver.

2017-04-12 Thread Jason Heo
Hello.

I'm using Apache Kudu 1.2 on CDH 5.10.

Background
---

I'm currently using Elasticsearch to serve web analytic service.
Elasticsearch is very easy to manage cluster. One nice feature of ES is
that I can disable allocation of shard (shard is similar to tablet of Kudu)
intentionally so that I can restart one of physical servers without
rebalancing shards. Or entire cluster can be rolling restarted.

I hope Apache Kudu has similar features.

Question
---

I found that if a tserver doesn't respond for 5 minutes, it starts
replicating tablets. And "5 min" can be configured. So far so good.

Q1. Can I disable redistributing tablets on failure of a tserver? The
reason why I need this is described in Background.
Q2. redistribution goes on even if the failed tserver reconnected to
cluster. In my test cluster, it took 2 hours to distribute when a tserver
which has 3TB data was killed.
Q3. `--follower_unavailable_considered_failed_sec` can be changed without
restarting cluster?

I think this email is somewhat unclear. Please ask me again to clarify if
you could not understand me.

Thanks,

Jason.


Physical Tablet Data size is larger than size in Chart Library.

2017-04-12 Thread Jason Heo
Hello.

I'm using Apache Kudu 1.2 on CDH 1.2.

I'm estimating how many servers needed to store my data.

After loading my test data sets,
total_kudu_on_disk_size_across_kudu_replicas in chart library at CDH is
27.9TB whereas sum of `du -sh /path/to/tablet_data/data` on each node is
39.9TB which is 43% bigger than chart library.

I also observed the same difference on my another Kudu test cluster.

I'm curious this is normal and wanted to know there is a way to reduce
physical file size.

Thanks,

Jason.