Re: Error message: 'Tried to update clock beyond the max. error.'

2017-11-01 Thread Franco Venturi
>From 'tablet_bootsratp.cc': 1030 14:29:37.324306 60682 tablet_bootstrap.cc:884] Check failed: _s.ok() Bad status: Invalid argument: Tried to update clock beyond the max. error. Franco - Original Message - From: "Todd Lipcon" To: user@kudu.apache.org Sent:

Re: Error message: 'Tried to update clock beyond the max. error.'

2017-11-01 Thread Todd Lipcon
Actually I think I understand the root cause of this. I think at some point NTP can switch the clock from a microseconds-based mode to a nanoseconds-based mode, at which point Kudu starts interpreting the results of the ntp_gettime system call incorrectly, resulting in incorrect error estimates

Re: Error message: 'Tried to update clock beyond the max. error.'

2017-11-01 Thread Todd Lipcon
What's the full log line where you're seeing this crash? Is it coming from tablet_bootstrap.cc, raft_consensus.cc, or elsewhere? -Todd 2017-11-01 15:45 GMT-07:00 Franco Venturi : > Our version is kudu 1.5.0-cdh5.13.0. > > Franco > > > > > -- Todd Lipcon Software

Re: Error message: 'Tried to update clock beyond the max. error.'

2017-11-01 Thread Franco Venturi
Our version is kudu 1.5.0-cdh5.13.0. Franco

Re: Low ingestion rate from Kafka

2017-11-01 Thread Todd Lipcon
On Wed, Nov 1, 2017 at 2:10 PM, Chao Sun wrote: > > Great. Keep in mind that, since you have a UUID component at the front > of your key, you are doing something like a random-write workload. So, as > your data grows, if your PK column (and its bloom filters) ends up being >

Re: Low ingestion rate from Kafka

2017-11-01 Thread Chao Sun
> Great. Keep in mind that, since you have a UUID component at the front of your key, you are doing something like a random-write workload. So, as your data grows, if your PK column (and its bloom filters) ends up being larger than the available RAM for caching, each write may generate a disk seek

Re: Kudu background tasks

2017-11-01 Thread Todd Lipcon
Hi Janne, It's not clear whether the issue was that it was taking a long time to restart (i.e replaying WALs) or if somehow you also ended up having to re-replicate a bunch of tablets from host to host in the cluster. There were some bugs in earlier versions of Kudu (eg KUDU-2125, KUDU-2020)

Re: Low ingestion rate from Kafka

2017-11-01 Thread Todd Lipcon
On Wed, Nov 1, 2017 at 1:23 PM, Chao Sun wrote: > Thanks Todd! I improved my code to use multi Kudu clients for processing > the Kafka messages and > was able to improve the number to 250K - 300K per sec. Pretty happy with > this now. > Great. Keep in mind that, since you have

Re: Low ingestion rate from Kafka

2017-11-01 Thread Todd Lipcon
On Tue, Oct 31, 2017 at 11:56 PM, Chao Sun wrote: > > Sure, but increasing the number of consumers can increase the throughput > (without increasing the number of Kudu tablet servers). > > I see. Make sense. I'll test that later. > > > Currently, if you run 'top' on the TS

Re: Low ingestion rate from Kafka

2017-11-01 Thread Chao Sun
> Sure, but increasing the number of consumers can increase the throughput (without increasing the number of Kudu tablet servers). I see. Make sense. I'll test that later. > Currently, if you run 'top' on the TS nodes, do you see them using a high amount of CPU? Similar question for 'iostat -dxm

Re: Low ingestion rate from Kafka

2017-11-01 Thread Chao Sun
Thanks Zhen and Todd. Yes increasing the # of consumers will definitely help, but we also want to test the best throughput we can get from Kudu. I think the default batch size is 1000 rows? I tested with a few different options between 1000 and 20, but always got some number between 15K to