The number is the log-ordered number of bytes. So really, the offset is
kinda like the "number of bytes" to begin reading from. 0 means read the
log from the beginning. The second message is 0 + size of message. So the
message "ids" are really just the offset of the previous message sizes.

For example, if I have three messages of 10 bytes each, and set the
consumer offset to 0, i'll read everything. If you set the offset to 10,
I'll read the second and third messages, and so on.

see more here:
http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
and here: http://kafka.apache.org/documentation.html#introduction

HTH!

On Wed, Feb 17, 2016 at 12:16 PM, John Bickerstaff <j...@johnbickerstaff.com
> wrote:

> *Use Case: Disaster Recovery & Re-indexing SOLR*
>
> I'm using Kafka to hold messages from a service that prepares "documents"
> for SOLR.
>
> A second micro service (a consumer) requests these messages, does any final
> processing, and fires them into SOLR.
>
> The whole thing is (in part) designed to be used for disaster recovery -
> allowing the rebuild of the SOLR index in the shortest possible time.
>
> To do this (and to be able to use it for re-indexing SOLR while testing
> relevancy) I need to be able to "play all messages from the beginning" at
> will.
>
> I find I can use the zkCli.sh tool to delete the Consumer Group Name like
> this:
>      rmr /kafka/consumers/myGroupName
>
> After which my microservice will get all the messages again when it runs.
>
> I was trying to find a way to do this programmatically without actually
> using the "low level" consumer api since the high level one is so simple
> and my code already works.  So I started playing with Zookeeper api for
> duplicating "rmr /kafka/consumers/myGroupName"
>
> *The Question: What does that offset actually represent?*
>
> It was at this point that I discovered the offset must represent something
> other than what I thought it would.  Things obviously work, but I'm
> wondering what - exactly do the offsets represent?
>
> To clarify - if I run this command on a zookeeper node, after the
> microservice has run:
>      get /kafka/consumers/myGroupName/offsets/myTopicName/0
>
> I get the following:
>
> 30024
> cZxid = 0x3600000355
> ctime = Fri Feb 12 07:27:50 MST 2016
> mZxid = 0x3600000357
> mtime = Fri Feb 12 07:29:50 MST 2016
> pZxid = 0x3600000355
> cversion = 0
> dataVersion = 2
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 5
> numChildren = 0
>
> Now - I have exactly 3500 messages in this Kafka topic.  I verify that by
> running this command:
>      bin/kafka-console-consumer.sh --zookeeper 192.168.56.5:2181/kafka
> --topic myTopicName --from-beginning
>
> When I hit Ctrl-C, it tells me it consumed 3500 messages.
>
> So - what does that 30024 actually represent?  If I reset that number to 1
> or 0 and re-run my consumer microservice, I get all the messages again -
> and the number again goes to 30024.  However, I'm not comfortable to trust
> that because my assumption that the number represents a simple count of
> messages that have been sent to this consumer is obviously wrong.
>
> (I reset the number like this -- to 1 -- and assume there's an API command
> that will do it too.)
>      set /kafka/consumers/myGroupName/offsets/myTopicName/0 1
>
> Can someone help me clarify or point me at a doc that explains what is
> getting counted here?  You can shoot me if you like for attempting the
> hack-ish solution of re-setting the offset through the Zookeeper API, but I
> would still like to understand what, exactly, is represented by that number
> 30024.
>
> I need to hand off to IT for the Disaster Recovery portion and saying
> "trust me, it just works" isn't going to fly very far...
>
> Thanks.
>



-- 
*Christian Posta*
twitter: @christianposta
http://www.christianposta.com/blog
http://fabric8.io

Reply via email to