Our 2 node kafka cluster has become unhealthy. We're running zookeeper as a 3 node system, which very light load.
What seems to be happening is in the controller log we get a ZK session expire message, and in the process of re-assigning the leader for the partitions (if I'm understanding this right, please correct me), the broker goes offline and it interrupts our applications that are publishing messages. We don't see this in production, and kafka has been stable for months, since september. I've searched a lot and found some similiar complaints but no real solutions. I'm running 0.8.2 and JVM 1.6.X on ubuntu. Thanks for any ideas at all.
