Re: Regarding Kafka

Abhit Kalsotra Sun, 09 Oct 2016 03:59:01 -0700

I did that but i am getting confusing results

e.g


I have created 4 Kafka Consumer threads for doing data analytic, these
threads just wait for Kafka messages to get consumed and
I have provided the key provided when I produce, it means that all the
messages will go to one single partition ref "
http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
"
"* On the consumer side, Kafka always gives a single partition’s data to
one consumer thread.*"

If you see my application logs, my 4 Kafka Consumer Application threads
which are calling consume() , Arn't all message of a particular ID should
be consumed by one Kafka Application thread ?

[2016-10-08 23:37:07.498]AxThreadId 23516 ->ID:4495 offset: 74 ][ID ID
date:2016-09-28 20:07:32.000 ]
[2016-10-08 23:37:07.498]AxThreadId 2208 ->ID:4496 offset: 80 ][ID ID
date: 2016-09-28 20:07:39.000 ]
[2016-10-08 23:37:07.498]AxThreadId 2208 ->ID:4495 offset: 77 ][ID
date: 2016-09-28 20:07:35.000 ]
[2016-10-08 23:37:07.498]AxThreadId 23516 ->ID:4495 offset: 76][ID
date: 2016-09-28 20:07:34.000 ]
[2016-10-08 23:37:07.498]AxThreadId 9540 ->ID:4495 offset: 75 ][ID
date: 2016-09-28 20:07:33.000 ]
[2016-10-08 23:37:07.499]AxThreadId 23516 ->ID:4495 offset: 78 ][ID
date: 2016-09-28 20:07:36.000 ]
[2016-10-08 23:37:07.499]AxThreadId 2208 ->ID:4495 offset: 79 ][ID
date: 2016-09-28 20:07:37.000 ]
[2016-10-08 23:37:07.499]AxThreadId 9540 ->ID:4495 offset: 80 ][ID
date: 2016-09-28 20:07:38.000 ]
[2016-10-08 23:37:07.500]AxThreadId 23516 ->ID:4495 offset: 81][ID
date: 2016-09-28 20:07:39.000 ]




On Sun, Oct 9, 2016 at 1:31 PM, Hans Jespersen <h...@confluent.io> wrote:

> Then publish with the user ID as the key and all messages for the same key
> will be guaranteed to go to the same partition and therefore be in order
> for whichever consumer gets that partition.
>
>
> //h...@confluent.io
> -------- Original message --------From: Abhit Kalsotra <abhit...@gmail.com>
> Date: 10/9/16  12:39 AM  (GMT-08:00) To: users@kafka.apache.org Subject:
> Re: Regarding Kafka
> What about the order of message getting received ? If i don't mention the
> partition.
>
> Lets say if i have user ID :4456 and I have to do some analytics at the
> Kafka Consumer end and at my consumer end if its not getting consumed the
> way I sent, then my analytics will go haywire.
>
> Abhi
>
> On Sun, Oct 9, 2016 at 12:50 PM, Hans Jespersen <h...@confluent.io> wrote:
>
> > You don't even have to do that because the default partitioner will
> spread
> > the data you publish to the topic over the available partitions for you.
> > Just try it out to see. Publish multiple messages to the topic without
> > using keys, and without specifying a partition, and observe that they are
> > automatically distributed out over the available partitions.
> >
> >
> > //h...@confluent.io
> > -------- Original message --------From: Abhit Kalsotra <
> abhit...@gmail.com>
> > Date: 10/8/16  11:19 PM  (GMT-08:00) To: users@kafka.apache.org Subject:
> > Re: Regarding Kafka
> > Hans
> >
> > Thanks for the response, yeah you can say yeah I am treating topics like
> > partitions, because my
> >
> > current logic of producing to a respective topic goes something like this
> >
> > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_
> > kafkaTopic[whichTopic],
> >
> partition,
> >
> > RdKafka::Producer::RK_MSG_COPY,
> >                                                                 ptr,
> >                                                                 size,
> >
> > &partitionKey,
> >                                                                 NULL);
> > where partitionKey is unique number or userID, so what I am doing
> currently
> > each partitionKey%10
> > so whats so ever is the remainder, I am dumping that to the respective
> > topic.
> >
> > But as per your suggestion, Let me create close to 40-50 partitions for a
> > single topic and when i am producing I do something like this
> >
> > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic,
> >
> > partition%(50),
> >
> > RdKafka::Producer::RK_MSG_COPY,
> >                                                                 ptr,
> >                                                                 size,
> >
> > &partitionKey,
> >                                                                 NULL);
> >
> > Abhi
> >
> > On Sun, Oct 9, 2016 at 10:13 AM, Hans Jespersen <h...@confluent.io>
> wrote:
> >
> > > Why do you have 10 topics?  It seems like you are treating topics like
> > > partitions and it's unclear why you don't just have 1 topic with 10,
> 20,
> > or
> > > even 30 partitions. Ordering is only guaranteed at a partition level.
> > >
> > > In general if you want to capacity plan for partitions you benchmark a
> > > single partition and then divide your peak estimated throughput by the
> > > results of the single partition results.
> > >
> > > If you expect the peak throughput to increase over time then double
> your
> > > partition count to allow room to grow the number of consumers without
> > > having to repartition.
> > >
> > > Sizing can be a bit more tricky if you are using keys but it doesn't
> > sound
> > > like you are if today you are publishing to topics the way you
> describe.
> > >
> > > -hans
> > >
> > > > On Oct 8, 2016, at 9:01 PM, Abhit Kalsotra <abhit...@gmail.com>
> wrote:
> > > >
> > > > Guys any views ?
> > > >
> > > > Abhi
> > > >
> > > >> On Sat, Oct 8, 2016 at 4:28 PM, Abhit Kalsotra <abhit...@gmail.com>
> > > wrote:
> > > >>
> > > >> Hello
> > > >>
> > > >> I am using librdkafka c++ library for my application .
> > > >>
> > > >> *My Kafka Cluster Set up*
> > > >> 2 Kafka Zookeper running on 2 different instances
> > > >> 7 Kafka Brokers , 4 Running on 1 machine and 3 running on other
> > machine
> > > >> Total 10 Topics and partition count is 3 with replication factor of
> 3.
> > > >>
> > > >> Now in my case I need to be very specific for the *message order*
> > when I
> > > >> am consuming the messages. I know if all the messages gets produced
> to
> > > the
> > > >> same partition, it always gets consumed in the same order.
> > > >>
> > > >> I need expert opinions like what's the ideal partition count I
> should
> > > >> consider without effecting performance.( I am looking for close to
> > > 100,000
> > > >> messages per seconds).
> > > >> The topics are from 0 to 9 and when I am producing messages I do
> > > something
> > > >> like uniqueUserId % 10 , and then pointing to a respective topic
> like
> > 0
> > > ||
> > > >> 1 || 2 etc..
> > > >>
> > > >> Abhi
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> If you can't succeed, call it version 1.0
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > If you can't succeed, call it version 1.0
> > >
> >
> >
> >
> > --
> > If you can't succeed, call it version 1.0
> >
>
>
>
> --
> If you can't succeed, call it version 1.0
>



-- 
If you can't succeed, call it version 1.0

Re: Regarding Kafka

Reply via email to