No, just a bare centos 6.5 on an EC2 instance
On Sep 11, 2014 1:39 AM, "Jun Rao" <jun...@gmail.com> wrote:

> I meant whether you start the broker in service containers like jetty or
> tomcat.
>
> Thanks,
>
> Jun
>
> On Wed, Sep 10, 2014 at 12:28 AM, Shlomi Hazan <shl...@viber.com> wrote:
>
> > Hi, sorry, what do you mean by 'container'? I use bare EC2 instances...
> > Shlomi
> >
> > On Wed, Sep 10, 2014 at 1:41 AM, Jun Rao <jun...@gmail.com> wrote:
> >
> > > Are you starting the broker in some container? You want to make sure
> that
> > > the container doesn't overwrite the open file handler limit.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Sep 9, 2014 at 12:05 AM, Shlomi Hazan <shl...@viber.com>
> wrote:
> > >
> > > > Hi,
> > > > it's probably beyond that. it may be an issue with the number of
> files
> > > > Kafka can have opened concurrently.
> > > > A previous conversation with Joe about (build failes for latest
> stable
> > > > source tgz (kafka_2.9.2-0.8.1.1)) turned out to discuss this (Q's by
> > Joe,
> > > > A's by me):
> > > >
> > > > 1. what else on the logs? [*see below*]
> > > > 2. other broker failure reason? [*"*]
> > > > 3. other broker failure after taking leadership? [*how can I be sure?
> > ask
> > > > another to describe topic?*]
> > > > 4. how do I measure number of connections? [*ls -l /proc/<pid>/fd |
> > grep
> > > > socket | wc -l, also did watch on that*]
> > > > 5. is that number equals the number of {new Producer}? [*yes*]
> > > > 6. how many topics? [*1*] how many partitions [*504*]
> > > > 7. Are u using a partition key? [*yes, I use the python client with*
> ]
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > *class ProducerIdPartitioner(Partitioner):    """    Implements a
> > > > partitioner which selects the target partition based on the sending
> > > > producer ID    """    def partition(self, key, partitions):
> > size =
> > > > len(partitions)        prod_id = int(key)        idx = prod_id %
> > > > size        return partitions[idx]*
> > > > 8. maybe running into over partitioned topic? [*producer instances
> is 6
> > > > machines * 84 procs * 24 threads, but never got to start them
> all*,*b/c
> > > of
> > > > errors*]
> > > > 9. r u running anything else? [*yes, zookeeper*]
> > > >
> > > >
> > > > answer to 1,2:
> > > > the error's I see on the python client are first timeouts and then
> > > message
> > > > send failures, using sync send.
> > > >
> > > > on the controller log:
> > > >
> > > > ontroller.log.2014-08-26-13:[2014-08-26 13:40:44,317] ERROR
> > > > [Controller-1-to-broker-3-send-thread], Controller 1 epoch 3 failed
> to
> > > send
> > > > StopReplica request with correlation id 519 to broker
> > > > id:3,host:shlomi-kafka-broker-3,port:9092. Reconnecting to broker.
> > > > (kafka.controller.RequestSendThread)
> > > > controller.log.2014-08-26-13:[2014-08-26 13:40:44,319] ERROR
> > > > [Controller-1-to-broker-3-send-thread], Controller 1's connection to
> > > broker
> > > > id:3,host:shlomi-kafka-broker-3,port:9092 was unsuccessful
> > > > (kafka.controller.RequestSendThread)
> > > >
> > > > on the server log (selected greps):
> > > > ...
> > > > server.log.2014-08-27-01:[2014-08-27 01:44:23,143] ERROR
> > > > [ReplicaFetcherThread-4-2], Error for partition
> [vpq_android_gcm_h,270]
> > > to
> > > > broker 2:class kafka.common.NotLeaderForPartitionException
> > > > (kafka.server.ReplicaFetcherThread)
> > > > ...
> > > > server.log.2014-08-27-12:[2014-08-27 12:08:34,638] ERROR Closing
> socket
> > > for
> > > > /10.184.150.54 because of error (kafka.network.Processor)
> > > >
> > > > ...
> > > > server.log.2014-08-28-07:[2014-08-28 07:57:35,944] ERROR
> [KafkaApi-1]
> > > > Error
> > > > when processing fetch request for partition [vpq_android_gcm_h,184]
> > > offset
> > > > 8798 from follower with correlation id 0 (kafka.server.KafkaApis)
> > > > ...
> > > > erver.log.2014-09-03-15:[2014-09-03 15:46:18,220] ERROR
> > > > [ReplicaFetcherThread-2-3], Error in fetch Name: FetchRequest;
> Version:
> > > 0;
> > > > CorrelationId: 177593; ClientId: ReplicaFetcherThread-2-3; ReplicaId:
> > 1;
> > > > MaxWait: 1000 ms; MinBytes: 1 bytes; RequestInfo:
> > [vpq_android_gcm_h,196]
> > > > -> PartitionFetchInfo(65283,8388608),[vpq_android_gcm_h,76] ->
> > > > PartitionFetchInfo(262787,8388608),[vpq_android_gcm_h,460] ->
> > > > PartitionFetchInfo(285709,8388608),[vpq_android_gcm_h,100] ->
> > > > PartitionFetchInfo(199405,8388608),[vpq_android_gcm_h,148] ->
> > > > PartitionFetchInfo(339032,8388608),[vpq_android_gcm_h,436] ->
> > > > PartitionFetchInfo(0,8388608),[vpq_android_gcm_h,124] ->
> > > > PartitionFetchInfo(484447,8388608),[vpq_android_gcm_h,484] ->
> > > > PartitionFetchInfo(105945,8388608),[vpq_android_gcm_h,340] ->
> > > > PartitionFetchInfo(0,8388608),[vpq_android_gcm_h,388] ->
> > > > PartitionFetchInfo(9,8388608),[vpq_android_gcm_h,316] ->
> > > > PartitionFetchInfo(194766,8388608),[vpq_android_gcm_h,364] ->
> > > > PartitionFetchInfo(139897,8388608),[vpq_android_gcm_h,292] ->
> > > > PartitionFetchInfo(195408,8388608),[vpq_android_gcm_h,28] ->
> > > > PartitionFetchInfo(329961,8388608),[vpq_android_gcm_h,172] ->
> > > > PartitionFetchInfo(436959,8388608),[vpq_android_gcm_h,268] ->
> > > > PartitionFetchInfo(59827,8388608),[vpq_android_gcm_h,244] ->
> > > > PartitionFetchInfo(259731,8388608),[vpq_android_gcm_h,220] ->
> > > > PartitionFetchInfo(61669,8388608),[vpq_android_gcm_h,412] ->
> > > > PartitionFetchInfo(563609,8388608),[vpq_android_gcm_h,4] ->
> > > > PartitionFetchInfo(360336,8388608),[vpq_android_gcm_h,52] ->
> > > > PartitionFetchInfo(378533,8388608)
> (kafka.server.ReplicaFetcherThread)
> > > > ...
> > > > server.log.2014-09-03-14:[2014-09-03 14:04:18,548] ERROR Error in
> > > acceptor
> > > > (kafka.network.Acceptor)
> > > > ...
> > > >
> > > >
> > > > and these may not be all (other logs may have some more of that)....
> > > >
> > > >
> > > > Joe said to just lower the number of connections but I still can't
> see
> > > the
> > > > exact problem.
> > > > is there a kafka limit to the number of concurrent open files? cause
> > the
> > > > process was not limited...
> > > >
> > > > Thanks,
> > > > Shlomi
> > > >
> > > > On Tue, Sep 9, 2014 at 7:12 AM, Jun Rao <jun...@gmail.com> wrote:
> > > >
> > > > > What type of error did you see? You may need to configure a larger
> > open
> > > > > file handler limit.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Wed, Sep 3, 2014 at 12:01 PM, Shlomi Hazan <hzshl...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am trying to load a cluster with over than 10K connections, and
> > > > bumped
> > > > > > into the error in the subject.
> > > > > > Is there any limitation on Kafka's side? if so it configurable?
> > how?
> > > > > > on first look, it looks like the selector accepting the
> connection
> > is
> > > > > > overflowing...
> > > > > >
> > > > > > Thanks.
> > > > > > --
> > > > > > Shlomi
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to