I just posted a new message with all of this distilled into a simple test and log output.
On Sat, Apr 29, 2017 at 6:53 PM, Dmitry Minkovsky <dminkov...@gmail.com> wrote: > It appears—at least according to debug logs—that the metadata request is > sent after the metadata update times out: > > > [... stack trace ommitted ...] > > Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to > update metadata after 5000 ms. > > > [2017-04-29 18:47:24,264] DEBUG (org.apache.kafka.clients.NetworkClient:751) > Sending metadata request (type=MetadataRequest, > topics=session-create-requests,session-load-requests,*join-requests*) to > node 0 > > > Note: *join-requests* is the topic I am attempting to produce to. > > [2017-04-29 18:47:24,272] DEBUG (org.apache.kafka.clients.Metadata:249) > Updated cluster metadata version 4 to Cluster(id = zKIrpvAdStqc9xcfBUuRxw, > nodes = [localhost:9092 (id: 0 rack: null)], partitions = [Partition(topic > = session-load-requests, partition = 0, leader = 0, replicas = [0], isr = > [0]), Partition(topic = session-create-requests, partition = 0, leader = 0, > replicas = [0], isr = [0]), Partition(topic = join-requests, partition = 0, > leader = 0, replicas = [0], isr = [0])]) > > > After this, sending with the producer to `join-requests` works. > > On Sat, Apr 29, 2017 at 6:26 PM, Dmitry Minkovsky <dminkov...@gmail.com> > wrote: > >> I have a producer that fails to get metadata when it first attempts to >> send a record to a certain topic. It fails on >> >> ClusterAndWaitTime clusterAndWaitTime = >> waitOnMetadata(record.topic(), record.partition(), maxBlockTimeMs); [0] >> >> and yields: >> >> org.apache.kafka.common.errors.TimeoutException: Failed to update >> metadata after 30000 ms. >> >> After the first attempt, however, the producer works fine. >> >> This problem does not occur when I isolate the producer in a test. >> >> However, when embedded in my service layer—which includes things like a >> Kafka Streams application and a gRPC service—the above behavior occurs. I >> played with this for a while and found that this occurs specifically when >> the producer producers to another topic just before it produces to the one >> that fails. If I remove the producing to the first topic, producing to the >> second topic succeeds, even on the first try. >> >> Please let me know what data I could provide to help debug this. I looked >> at logs at DEBUG level for the NetworkClient but nothing stuck out at me. I >> am on 0.10.2.1, but this happens on 0.10.2.0 and 0.10.1.0 as well. I am not >> using topic auto creation. >> >> Thank you, >> Dmitry >> >> >> [0] https://github.com/apache/kafka/blob/0.10.2.1/clients/src/ma >> in/java/org/apache/kafka/clients/producer/KafkaProducer.java#L450 >> > >