[ 
https://issues.apache.org/jira/browse/KAFKA-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15806421#comment-15806421
 ] 

Apurva Mehta edited comment on KAFKA-4558 at 1/7/17 2:00 AM:
-------------------------------------------------------------

So I had a look at the code. All the 13 tests which use 
`ProduceConsumeValidate` have changed since that commit. So it is totally 
unproductive revert that change at this point.

Regarding your proposal for two metrics: partitions assigned and per-partition 
lag may not be what we want. Particularly, in the`ProduceConsumeValidate` test, 
the producer is started after the consumer. So if the topic is originally 
empty, or if the consumer is configured to read from the end, the lag will 
always be zero. This is per my understanding of how lag is reported, viz. how 
far from the tail of the log the consumer is. So the lag metric probably won't 
be very useful in majority of the cases. 

But waiting until partitions assigned is non zero may be what we want. The 
tests I have seen just have a single console consumer for the entire topic, so 
there should be enough partitions to go around. Of course this may not be true 
in the future). At the very least it will be better than what we have right 
now. And if there are not enough partitions to go around, the test will fail 
early (since the wait_until will time out), and can be diagnosed before 
checkin. 

Regarding implementation of partitions assigned alone, I thought it might be 
worth staging the implementation by first using the metric through jmx. This 
would give us a shorter turn around time and validate whether this approach is 
sufficient to fix the current issues. We can even play with different metrics 
more quickly if necessary. 

Finally, would adding an HttpMetricsReporter necessitate a KIP?




was (Author: apurva):
So I had a look at the code. All the 13 tests which use 
`ProduceConsumeValidate` have changed since that commit. So it is totally 
unproductive revert that change at this point.

Regarding your proposal for two metrics: partitions assigned and per-partition 
lag may not be what we want. Particularly, in the`ProduceConsumeValidate` test, 
the producer is started after the consumer. So if the topic is originally 
empty, or if the consumer is configured to read from the end, the lag will 
always be zero. This is per my understanding of how lag is reported, viz. how 
far from the tail of the log the consumer is. So the lag metric probably won't 
be very useful in majority of the cases. 

But waiting until partitions assigned is non zero may be what we want. At the 
very least it will be better than what we have right now.

Regarding implementation of partitions assigned alone, I thought it might be 
worth staging the implementation by first using the metric through jmx. This 
would give us a shorter turn around time and validate whether this approach is 
sufficient to fix the current issues. We can even play with different metrics 
more quickly if necessary. 

Finally, would adding an HttpMetricsReporter necessitate a KIP?



> throttling_test fails if the producer starts too fast.
> ------------------------------------------------------
>
>                 Key: KAFKA-4558
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4558
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Apurva Mehta
>            Assignee: Apurva Mehta
>
> As described in https://issues.apache.org/jira/browse/KAFKA-4526, the 
> throttling test will fail if the producer in the produce-consume-validate 
> loop starts up before the consumer is fully initialized.
> We need to block the start of the producer until the consumer is ready to go. 
> The current plan is to poll the consumer for a particular metric (like, for 
> instance, partition assignment) which will act as a good proxy for successful 
> initialization. Currently, we just check for the existence of a process with 
> the PID, which is not a strong enough check, causing the test to fail 
> intermittently. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to