Re: Kafka with RAID 5 on. busy cluster.

2020-03-23 Thread Vishal Santoshi
<< In RAID 5 one can loose more than only one disk RAID here will be data corruption. >> In RAID 5 if one looses more than only one disk RAID there will be data corruption. On Mon, Mar 23, 2020 at 11:27 PM Vishal Santoshi wrote: > One obvious issue is disk failure toleration . As in if RF =3

Re: Kafka with RAID 5 on. busy cluster.

2020-03-23 Thread Vishal Santoshi
One obvious issue is disk failure toleration . As in if RF =3 on.normal JBOD disk failure toleration is 2. In RAID 5 one can loose more than only one disk RAID here will be data corruption. effectively making the broker unusable, thus reducing our drive failure toleration to 2 drives ON 2

Kafka with RAID 5 on. busy cluster.

2020-03-23 Thread Vishal Santoshi
We have a pretty busy kafka cluster with SSD and plain JBOD. We planning or thinking of using RAID 5 ( hardware raid or 6 drive SSD bokers ) instead of JBID for various reasons. Hss some one used RAID 5 ( we know that there is a write overhead parity bit on blocks and recreating a dead drive )

Max poll interval and timeouts

2020-03-23 Thread Ryan Schachte
Hey guys, I'm getting a bit overwhelmed by the different variables used to help enable batching for me. I have some custom batching logic that processes when either N records have been buffered or my max timeout has been hit. It was working decently well, but I hit this error: *This means that

Re: Kafka Streams - partition assignment for the input topic

2020-03-23 Thread Sophie Blee-Goldman
I don't think it has anything to do with your specific topology, but it might be that the "stickiness" is overriding the "data parallelism balance" in the current assignment algorithm. There are a lot of different factors to optimize for, so we end up making tradeoffs with a rough hierarchy of

Re: MirrorMaker2 - uneven loadbalancing

2020-03-23 Thread Ryanne Dolan
Thanks Peter for running this experiment. That looks sorta normal. It looks like Connect is deciding to use 10 total tasks and doesn't care which ones do what. Ideally you'd see the MirrorSourceConnector tasks evenly divided, since they do the bulk of the work -- but that doesn't seem to be the

Re: MirrorMaker2 - uneven loadbalancing

2020-03-23 Thread Péter Sinóros-Szabó
so I made some tests with tasks.max = 4 with 2 instances: - instance 1: 4 MirrorSourceConnector, 1 MirrorHeartbeatConnector tasks - instance 2: 4 MirrorCheckpointConnector, 1 MirrorHeartbeatConnector tasks with 3 instances: - instance 1: 3 MirrorCheckpointConnector tasks - instance 2: 3

Re: Kafka Streams - partition assignment for the input topic

2020-03-23 Thread Stephen Young
Thanks for your help Sophie and Matthias. In my cloud environment I'm using kafka version 2.2.1. I've tested this locally with 2.4.1 and I can see the same issue with 3 local instances. As I add more local instances I start to see better balancing. I was wondering if the issue could be because

Re: Kafka Streams - partition assignment for the input topic

2020-03-23 Thread Stephen Young
Thanks for your help Sophie and Matthias. In my cloud environment I'm using kafka version 2.2.1. I've tested this locally with 2.4.1 and I can see the same issue with 3 local instances. As I add more local instances I start to see better balancing. I was wondering if the issue could be because