Re: Consumer Parallelism

2014-08-12 Thread Guozhang Wang
Mingtao, We have also noticed this issue and are trying to fix it in the new producer: KAFKA-1586 Guozhang On Tue, Aug 12, 2014 at 9:41 AM, Mingtao Zhang wrote: > Great! I am in the 10 min category for sure. I do see there is NO > partition

Re: Consumer Parallelism

2014-08-12 Thread Mingtao Zhang
Great! I am in the 10 min category for sure. I do see there is NO partition key provided in our code. I feel it's too much 'customization' when Kafka provides a 'randomness' default partition strategy while have another layer doing the 10 min tricky to optimize socket stuff. Anyway, thank you

Re: Consumer Parallelism

2014-08-12 Thread Guozhang Wang
I see your question now. You may want to read this FAQ: https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified ? On Tue, Aug 12, 2014 at 8:11 AM, Mingtao Zhang wrote: > Hi Guozhang, > > I think what I am looking

Re: Consumer Parallelism

2014-08-12 Thread Mingtao Zhang
Hi Guozhang, I think what I am looking for is the real 'randomness' when producer write to the partitions. Based on my log, through a long time period, only one partition got the write, while the other side, only one consumer is active. In my case the consumer is slow, so when it comes back for th

Re: Consumer Parallelism

2014-08-12 Thread Mingtao Zhang
Hi Guozhang, Thank you! Could I say the consumer 'take turns to consume' is resulted by the correspond partition got the 'message write'? The problem I am facing is my 'enrichment' (getting more data based on raw data) consumer took too much time to complete one message consumption. To explor

Re: Consumer Parallelism

2014-08-11 Thread Guozhang Wang
Hello Mingtao, The partition will not be re-assigned to other consumers unless the current consumer fails, so the behavior you described will not be expected. Guozhang On Mon, Aug 11, 2014 at 6:27 PM, Mingtao Zhang wrote: > Hi Guozhang, > > I do have another Email talking about Partitions per

Re: Consumer Parallelism

2014-08-11 Thread Mingtao Zhang
Hi Guozhang, I do have another Email talking about Partitions per topic. I paste it within this Email. I am expecting those consumers will work concurrently. The behavior I observed here is consumer thread-1 will work a while, then thread-3 will work, then thread-0 ..., is it normal? version is

Re: Consumer Parallelism

2014-08-11 Thread Guozhang Wang
Mingtao, How many partitions of the consumed topic has? Basically the data is distributed per-partition, and hence if the number of consumers is larger than the number of partitions, some consumers will not get any data. Guozhang On Mon, Aug 11, 2014 at 3:29 PM, Mingtao Zhang wrote: > Is it a

Re: Consumer Parallelism

2014-08-11 Thread Mingtao Zhang
Is it anyhow related to the issue? WARN No previously checkpointed highwatermark value found for topic RAW partition 0. Returning 0 as the highwatermark (kafka.server.HighwaterMarkCheckpoint) Mingtao

Consumer Parallelism

2014-08-11 Thread Mingtao Zhang
Hi, We are using the following method on ConsumerConnector to get multiple streams per topic, and we have multiple partitions per topic. It looks like only one of the runnable is active through a relative long time period. Is there anything we could possible missed? public Map>> createMessageStr