[ 
https://issues.apache.org/jira/browse/KAFKA-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-7136.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 1.1.1

> PushHttpMetricsReporter may deadlock when processing metrics changes
> --------------------------------------------------------------------
>
>                 Key: KAFKA-7136
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7136
>             Project: Kafka
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 1.1.0, 2.0.0
>            Reporter: Rajini Sivaram
>            Assignee: Rajini Sivaram
>            Priority: Blocker
>             Fix For: 2.0.0, 1.1.1
>
>
> We noticed a deadlock in {{PushHttpMetricsReporter}}. Locking for metrics was 
> changed under KAFKA-6765 to avoid {{NullPointerException}} in metrics 
> reporters due to concurrent read and updates. {{PushHttpMetricsReporter}} 
> requires a lock to process metrics registration that is invoked while holding 
> the sensor lock. It also reads metrics attempting to acquire sensor lock 
> while holding its lock (inverse order). This resulted in the deadlock below.
> {quote}Found one Java-level deadlock:
>  Java stack information for the threads listed above:
>  ===================================================
>  "StreamThread-7":
>  at 
> org.apache.kafka.tools.PushHttpMetricsReporter.metricChange(PushHttpMetricsReporter.java:144)
>  - waiting to lock <0x0000000655a54310> (a java.lang.Object)
>  at org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:563)
>  - locked <0x0000000655a44a28> (a org.apache.kafka.common.metrics.Metrics)
>  at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:236)
>  - locked <0x000000065629c170> (a org.apache.kafka.common.metrics.Sensor)
>  at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:217)
>  at 
> org.apache.kafka.common.network.Selector$SelectorMetrics.maybeRegisterConnectionMetrics(Selector.java:1016)
>  at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:462)
>  at org.apache.kafka.common.network.Selector.poll(Selector.java:425)
>  at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:510)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:271)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218)
>  at 
> org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:274)
>  at 
> org.apache.kafka.clients.consumer.internals.Fetcher.getAllTopicMetadata(Fetcher.java:254)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1820)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1798)
>  at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.refreshChangelogInfo(StoreChangelogReader.java:224)
>  at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.initialize(StoreChangelogReader.java:121)
>  at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:74)
>  at 
> org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:317)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:824)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
>  "pool-17-thread-1":
>  at 
> org.apache.kafka.common.metrics.KafkaMetric.measurableValue(KafkaMetric.java:82)
>  - waiting to lock <0x000000065629c170> (a 
> org.apache.kafka.common.metrics.Sensor)
>  at org.apache.kafka.common.metrics.KafkaMetric.value(KafkaMetric.java:58)
>  at 
> org.apache.kafka.tools.PushHttpMetricsReporter$HttpReporter.run(PushHttpMetricsReporter.java:177)
>  - locked <0x0000000655a54310> (a java.lang.Object)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Found 1 deadlock.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to