No, just a bare centos 6.5 on an EC2 instance On Sep 11, 2014 1:39 AM, "Jun Rao" <jun...@gmail.com> wrote:
> I meant whether you start the broker in service containers like jetty or > tomcat. > > Thanks, > > Jun > > On Wed, Sep 10, 2014 at 12:28 AM, Shlomi Hazan <shl...@viber.com> wrote: > > > Hi, sorry, what do you mean by 'container'? I use bare EC2 instances... > > Shlomi > > > > On Wed, Sep 10, 2014 at 1:41 AM, Jun Rao <jun...@gmail.com> wrote: > > > > > Are you starting the broker in some container? You want to make sure > that > > > the container doesn't overwrite the open file handler limit. > > > > > > Thanks, > > > > > > Jun > > > > > > On Tue, Sep 9, 2014 at 12:05 AM, Shlomi Hazan <shl...@viber.com> > wrote: > > > > > > > Hi, > > > > it's probably beyond that. it may be an issue with the number of > files > > > > Kafka can have opened concurrently. > > > > A previous conversation with Joe about (build failes for latest > stable > > > > source tgz (kafka_2.9.2-0.8.1.1)) turned out to discuss this (Q's by > > Joe, > > > > A's by me): > > > > > > > > 1. what else on the logs? [*see below*] > > > > 2. other broker failure reason? [*"*] > > > > 3. other broker failure after taking leadership? [*how can I be sure? > > ask > > > > another to describe topic?*] > > > > 4. how do I measure number of connections? [*ls -l /proc/<pid>/fd | > > grep > > > > socket | wc -l, also did watch on that*] > > > > 5. is that number equals the number of {new Producer}? [*yes*] > > > > 6. how many topics? [*1*] how many partitions [*504*] > > > > 7. Are u using a partition key? [*yes, I use the python client with* > ] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *class ProducerIdPartitioner(Partitioner): """ Implements a > > > > partitioner which selects the target partition based on the sending > > > > producer ID """ def partition(self, key, partitions): > > size = > > > > len(partitions) prod_id = int(key) idx = prod_id % > > > > size return partitions[idx]* > > > > 8. maybe running into over partitioned topic? [*producer instances > is 6 > > > > machines * 84 procs * 24 threads, but never got to start them > all*,*b/c > > > of > > > > errors*] > > > > 9. r u running anything else? [*yes, zookeeper*] > > > > > > > > > > > > answer to 1,2: > > > > the error's I see on the python client are first timeouts and then > > > message > > > > send failures, using sync send. > > > > > > > > on the controller log: > > > > > > > > ontroller.log.2014-08-26-13:[2014-08-26 13:40:44,317] ERROR > > > > [Controller-1-to-broker-3-send-thread], Controller 1 epoch 3 failed > to > > > send > > > > StopReplica request with correlation id 519 to broker > > > > id:3,host:shlomi-kafka-broker-3,port:9092. Reconnecting to broker. > > > > (kafka.controller.RequestSendThread) > > > > controller.log.2014-08-26-13:[2014-08-26 13:40:44,319] ERROR > > > > [Controller-1-to-broker-3-send-thread], Controller 1's connection to > > > broker > > > > id:3,host:shlomi-kafka-broker-3,port:9092 was unsuccessful > > > > (kafka.controller.RequestSendThread) > > > > > > > > on the server log (selected greps): > > > > ... > > > > server.log.2014-08-27-01:[2014-08-27 01:44:23,143] ERROR > > > > [ReplicaFetcherThread-4-2], Error for partition > [vpq_android_gcm_h,270] > > > to > > > > broker 2:class kafka.common.NotLeaderForPartitionException > > > > (kafka.server.ReplicaFetcherThread) > > > > ... > > > > server.log.2014-08-27-12:[2014-08-27 12:08:34,638] ERROR Closing > socket > > > for > > > > /10.184.150.54 because of error (kafka.network.Processor) > > > > > > > > ... > > > > server.log.2014-08-28-07:[2014-08-28 07:57:35,944] ERROR > [KafkaApi-1] > > > > Error > > > > when processing fetch request for partition [vpq_android_gcm_h,184] > > > offset > > > > 8798 from follower with correlation id 0 (kafka.server.KafkaApis) > > > > ... > > > > erver.log.2014-09-03-15:[2014-09-03 15:46:18,220] ERROR > > > > [ReplicaFetcherThread-2-3], Error in fetch Name: FetchRequest; > Version: > > > 0; > > > > CorrelationId: 177593; ClientId: ReplicaFetcherThread-2-3; ReplicaId: > > 1; > > > > MaxWait: 1000 ms; MinBytes: 1 bytes; RequestInfo: > > [vpq_android_gcm_h,196] > > > > -> PartitionFetchInfo(65283,8388608),[vpq_android_gcm_h,76] -> > > > > PartitionFetchInfo(262787,8388608),[vpq_android_gcm_h,460] -> > > > > PartitionFetchInfo(285709,8388608),[vpq_android_gcm_h,100] -> > > > > PartitionFetchInfo(199405,8388608),[vpq_android_gcm_h,148] -> > > > > PartitionFetchInfo(339032,8388608),[vpq_android_gcm_h,436] -> > > > > PartitionFetchInfo(0,8388608),[vpq_android_gcm_h,124] -> > > > > PartitionFetchInfo(484447,8388608),[vpq_android_gcm_h,484] -> > > > > PartitionFetchInfo(105945,8388608),[vpq_android_gcm_h,340] -> > > > > PartitionFetchInfo(0,8388608),[vpq_android_gcm_h,388] -> > > > > PartitionFetchInfo(9,8388608),[vpq_android_gcm_h,316] -> > > > > PartitionFetchInfo(194766,8388608),[vpq_android_gcm_h,364] -> > > > > PartitionFetchInfo(139897,8388608),[vpq_android_gcm_h,292] -> > > > > PartitionFetchInfo(195408,8388608),[vpq_android_gcm_h,28] -> > > > > PartitionFetchInfo(329961,8388608),[vpq_android_gcm_h,172] -> > > > > PartitionFetchInfo(436959,8388608),[vpq_android_gcm_h,268] -> > > > > PartitionFetchInfo(59827,8388608),[vpq_android_gcm_h,244] -> > > > > PartitionFetchInfo(259731,8388608),[vpq_android_gcm_h,220] -> > > > > PartitionFetchInfo(61669,8388608),[vpq_android_gcm_h,412] -> > > > > PartitionFetchInfo(563609,8388608),[vpq_android_gcm_h,4] -> > > > > PartitionFetchInfo(360336,8388608),[vpq_android_gcm_h,52] -> > > > > PartitionFetchInfo(378533,8388608) > (kafka.server.ReplicaFetcherThread) > > > > ... > > > > server.log.2014-09-03-14:[2014-09-03 14:04:18,548] ERROR Error in > > > acceptor > > > > (kafka.network.Acceptor) > > > > ... > > > > > > > > > > > > and these may not be all (other logs may have some more of that).... > > > > > > > > > > > > Joe said to just lower the number of connections but I still can't > see > > > the > > > > exact problem. > > > > is there a kafka limit to the number of concurrent open files? cause > > the > > > > process was not limited... > > > > > > > > Thanks, > > > > Shlomi > > > > > > > > On Tue, Sep 9, 2014 at 7:12 AM, Jun Rao <jun...@gmail.com> wrote: > > > > > > > > > What type of error did you see? You may need to configure a larger > > open > > > > > file handler limit. > > > > > > > > > > Thanks, > > > > > > > > > > Jun > > > > > > > > > > On Wed, Sep 3, 2014 at 12:01 PM, Shlomi Hazan <hzshl...@gmail.com> > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I am trying to load a cluster with over than 10K connections, and > > > > bumped > > > > > > into the error in the subject. > > > > > > Is there any limitation on Kafka's side? if so it configurable? > > how? > > > > > > on first look, it looks like the selector accepting the > connection > > is > > > > > > overflowing... > > > > > > > > > > > > Thanks. > > > > > > -- > > > > > > Shlomi > > > > > > > > > > > > > > > > > > > > >