Hi Raghu, Dataflow throws an exception if Kafka fails now and it recovers after Kafka is available.
Regards Em sex, 19 de out de 2018 às 14:01, Raghu Angadi <[email protected]> escreveu: > On Fri, Oct 19, 2018 at 6:54 AM Eduardo Soldera < > [email protected]> wrote: > >> Hi Raghu, just a quick update. We were waiting for Spotify's Scio to >> update to Beam 2.7. We've just deployed the pipeling sucessfully. Just for >> letting you know, I tried to use the workaround code snipped, but Dataflow >> wouldn't recover after a Kafka unavailability. >> > > Thanks for the update. The workaround helps only if KafkaClient itself can > recover when try to read again. I guess some of those exceptions are are > not recoverable. > > Please let us know how the actual fix works. > > Thanks. > Raghu. > > >> >> Thanks for your help. >> >> Regards >> >> Em qua, 19 de set de 2018 às 15:37, Raghu Angadi <[email protected]> >> escreveu: >> >>> >>> >>> On Wed, Sep 19, 2018 at 11:24 AM Juan Carlos Garcia <[email protected]> >>> wrote: >>> >>>> Sorry I hit the send button to fast... The error occurs in the worker. >>>> >>> >>> Np. Just one more comment on it: it is a very important >>> design/correctness decision to for runner to decide how to handle >>> persistent errors in a streaming pipeline. Dataflow keeps failing since >>> there is no solution to restart a pipeline from scratch without losing >>> exactly-once guarantees. It lets user decide if the pipeline needs to be >>> 'upgraded'. >>> >>> Raghu. >>> >>>> >>>> Juan Carlos Garcia <[email protected]> schrieb am Mi., 19. Sep. >>>> 2018, 20:22: >>>> >>>>> Sorry for hijacking the thread, we are running Spark on top of Yarn, >>>>> yarn retries multiple times until it reachs it max attempt and then gives >>>>> up. >>>>> >>>>> Raghu Angadi <[email protected]> schrieb am Mi., 19. Sep. 2018, >>>>> 18:58: >>>>> >>>>>> On Wed, Sep 19, 2018 at 7:26 AM Juan Carlos Garcia < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Don't know if its related, but we have seen our pipeline dying >>>>>>> (using SparkRunner) when there is problem with Kafka (network >>>>>>> interruptions), errors like: >>>>>>> org.apache.kafka.common.errors.TimeoutException: Timeout expired while >>>>>>> fetching topic metadata >>>>>>> >>>>>>> Maybe this will fix it as well, thanks Raghu for the hint about >>>>>>> *withConsumerFactoryFn.* >>>>>>> >>>>>> >>>>>> Wouldn't that be retried by the SparkRunner if it happens on the >>>>>> worker? or does it happen while launching the pipeline on the client? >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Sep 19, 2018 at 3:29 PM Eduardo Soldera < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi Raghu, thank you. >>>>>>>> >>>>>>>> I'm not sure though what to pass as an argument: >>>>>>>> >>>>>>>> KafkaIO.read[String,String]() >>>>>>>> .withBootstrapServers(server) >>>>>>>> .withTopic(topic) >>>>>>>> .withKeyDeserializer(classOf[StringDeserializer]) >>>>>>>> .withValueDeserializer(classOf[StringDeserializer]) >>>>>>>> .withConsumerFactoryFn(new >>>>>>>> KafkaExecutor.ConsumerFactoryFn(????????????????)) >>>>>>>> .updateConsumerProperties(properties) >>>>>>>> .commitOffsetsInFinalize() >>>>>>>> .withoutMetadata() >>>>>>>> >>>>>>>> >>>>>>>> Regards >>>>>>>> >>>>>>>> >>>>>>>> Em ter, 18 de set de 2018 às 21:15, Raghu Angadi < >>>>>>>> [email protected]> escreveu: >>>>>>>> >>>>>>>>> Hi Eduardo, >>>>>>>>> >>>>>>>>> There another work around you can try without having to wait for >>>>>>>>> 2.7.0 release: Use a wrapper to catch exception from >>>>>>>>> KafkaConsumer#poll() >>>>>>>>> and pass the wrapper to withConsumerFactoryFn() for KafkIO reader [1]. >>>>>>>>> >>>>>>>>> Using something like (such a wrapper is used in KafkasIO tests >>>>>>>>> [2]): >>>>>>>>> private static class ConsumerFactoryFn >>>>>>>>> implements SerializableFunction<Map<String, >>>>>>>>> Object>, Consumer<byte[], byte[]>> { >>>>>>>>> @Override >>>>>>>>> public Consumer<byte[], byte[]> apply(Map<String, Object> >>>>>>>>> config) { >>>>>>>>> return new KafkaConsumer(config) { >>>>>>>>> @Override >>>>>>>>> public ConsumerRecords<K, V> poll(long timeout) { >>>>>>>>> // work around for BEAM-5375 >>>>>>>>> while (true) { >>>>>>>>> try { >>>>>>>>> return super.poll(timeout); >>>>>>>>> } catch (Exception e) { >>>>>>>>> // LOG & sleep for sec >>>>>>>>> } >>>>>>>>> } >>>>>>>>> } >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> [1]: >>>>>>>>> https://github.com/apache/beam/blob/release-2.4.0/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L417 >>>>>>>>> [2]: >>>>>>>>> https://github.com/apache/beam/blob/release-2.4.0/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOTest.java#L261 >>>>>>>>> >>>>>>>>> On Tue, Sep 18, 2018 at 5:49 AM Eduardo Soldera < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi Raghu, we're not sure how long the network was down. According >>>>>>>>>> to the logs no longer than one minute. A 30 second shutdown would >>>>>>>>>> work for >>>>>>>>>> the tests. >>>>>>>>>> >>>>>>>>>> Regards >>>>>>>>>> >>>>>>>>>> Em sex, 14 de set de 2018 às 21:41, Raghu Angadi < >>>>>>>>>> [email protected]> escreveu: >>>>>>>>>> >>>>>>>>>>> Thanks. I could repro myself as well. How long was the network >>>>>>>>>>> down? >>>>>>>>>>> >>>>>>>>>>> Trying to get the fix into 2.7 RC2. >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 14, 2018 at 12:25 PM Eduardo Soldera < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Just to make myself clear, I'm not sure how to use the patch >>>>>>>>>>>> but if you could send us some guidance would be great. >>>>>>>>>>>> >>>>>>>>>>>> Em sex, 14 de set de 2018 às 16:24, Eduardo Soldera < >>>>>>>>>>>> [email protected]> escreveu: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Raghu, yes, it is feasible, would you do that for us? I'm >>>>>>>>>>>>> not sure how we'd use the patch. We're using SBT and Spotify's >>>>>>>>>>>>> Scio with >>>>>>>>>>>>> Scala. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> >>>>>>>>>>>>> Em sex, 14 de set de 2018 às 16:07, Raghu Angadi < >>>>>>>>>>>>> [email protected]> escreveu: >>>>>>>>>>>>> >>>>>>>>>>>>>> Is is feasible for you to verify the fix in your dev job? I >>>>>>>>>>>>>> can make a patch against Beam 2.4 branch if you like. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Raghu. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 14, 2018 at 11:14 AM Eduardo Soldera < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Raghu, thank you very much for the pull request. >>>>>>>>>>>>>>> We'll wait for the 2.7 Beam release. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Em qui, 13 de set de 2018 às 18:19, Raghu Angadi < >>>>>>>>>>>>>>> [email protected]> escreveu: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Fix: https://github.com/apache/beam/pull/6391 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Sep 12, 2018 at 3:30 PM Raghu Angadi < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Filed BEAM-5375 >>>>>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-5375>. I will >>>>>>>>>>>>>>>>> fix it later this week. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Sep 12, 2018 at 12:16 PM Raghu Angadi < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Sep 12, 2018 at 12:11 PM Raghu Angadi < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks for the job id, I looked at the worker logs >>>>>>>>>>>>>>>>>>> (following usual support oncall access protocol that >>>>>>>>>>>>>>>>>>> provides temporary >>>>>>>>>>>>>>>>>>> access to things like logs in GCP): >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Root issue looks like consumerPollLoop() mentioned >>>>>>>>>>>>>>>>>>> earlier needs to handle unchecked exception. In your case >>>>>>>>>>>>>>>>>>> it is clear that >>>>>>>>>>>>>>>>>>> poll thread exited with a runtime exception. The reader >>>>>>>>>>>>>>>>>>> does not check for >>>>>>>>>>>>>>>>>>> it and continues to wait for poll thread to enqueue >>>>>>>>>>>>>>>>>>> messages. A fix should >>>>>>>>>>>>>>>>>>> result in an IOException for read from the source. The >>>>>>>>>>>>>>>>>>> runners will handle >>>>>>>>>>>>>>>>>>> that appropriately after that. I will file a jira. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java#L678 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Ignore the link.. was pasted here by mistake. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> From the logs (with a comment below each one): >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - 2018-09-12 06:13:07.345 PDT Reader-0: reading from >>>>>>>>>>>>>>>>>>> kafka_topic-0 starting at offset 2 >>>>>>>>>>>>>>>>>>> - Implies the reader is initialized and poll >>>>>>>>>>>>>>>>>>> thread is started. >>>>>>>>>>>>>>>>>>> - 2018-09-12 06:13:07.780 PDT Reader-0: first record >>>>>>>>>>>>>>>>>>> offset 2 >>>>>>>>>>>>>>>>>>> - The reader actually got a message received by >>>>>>>>>>>>>>>>>>> the poll thread from Kafka. >>>>>>>>>>>>>>>>>>> - 2018-09-12 06:16:48.771 PDT Reader-0: exception >>>>>>>>>>>>>>>>>>> while fetching latest offset for partition >>>>>>>>>>>>>>>>>>> kafka_topic-0. will be retried. >>>>>>>>>>>>>>>>>>> - This must have happened around the time when >>>>>>>>>>>>>>>>>>> network was disrupted. This is from. Actual log is >>>>>>>>>>>>>>>>>>> from another periodic >>>>>>>>>>>>>>>>>>> task that fetches latest offsets for partitions. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The poll thread must have died around the time network >>>>>>>>>>>>>>>>>>> was disrupted. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The following log comes from kafka client itself and is >>>>>>>>>>>>>>>>>>> printed every second when KafkaIO fetches latest offset. >>>>>>>>>>>>>>>>>>> This log seems to >>>>>>>>>>>>>>>>>>> be added in recent versions. It is probably an >>>>>>>>>>>>>>>>>>> unintentional log. I don't >>>>>>>>>>>>>>>>>>> think there is any better to fetch latest offsets than how >>>>>>>>>>>>>>>>>>> KafkaIO does >>>>>>>>>>>>>>>>>>> now. This is logged inside consumer.position() called at >>>>>>>>>>>>>>>>>>> [1]. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - 2018-09-12 06:13:11.786 PDT [Consumer >>>>>>>>>>>>>>>>>>> clientId=consumer-2, >>>>>>>>>>>>>>>>>>> groupId=Reader-0_offset_consumer_1735388161_genericPipe] >>>>>>>>>>>>>>>>>>> Resetting offset >>>>>>>>>>>>>>>>>>> for partition com.arquivei.dataeng.andre-0 to offset 3. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> [1]: >>>>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java#L678 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This 'Resetting offset' is harmless, but is quite >>>>>>>>>>>>>>>>>> annoying to see in the worker logs. One way to avoid is to >>>>>>>>>>>>>>>>>> set kafka >>>>>>>>>>>>>>>>>> consumer's log level to WARNING. Ideally KafkaIO itself >>>>>>>>>>>>>>>>>> should do something >>>>>>>>>>>>>>>>>> to avoid it without user option. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, Sep 12, 2018 at 10:27 AM Eduardo Soldera < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi Raghu! The job_id of our dev job is >>>>>>>>>>>>>>>>>>>> 2018-09-12_06_11_48-5600553605191377866. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Em qua, 12 de set de 2018 às 14:18, Raghu Angadi < >>>>>>>>>>>>>>>>>>>> [email protected]> escreveu: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks for debugging. >>>>>>>>>>>>>>>>>>>>> Can you provide the job_id of your dev job? The >>>>>>>>>>>>>>>>>>>>> stacktrace shows that there is no thread running >>>>>>>>>>>>>>>>>>>>> 'consumerPollLoop()' which >>>>>>>>>>>>>>>>>>>>> can explain stuck reader. You will likely find a logs at >>>>>>>>>>>>>>>>>>>>> line 594 & 587 >>>>>>>>>>>>>>>>>>>>> [1]. Dataflow caches its readers and DirectRunner may >>>>>>>>>>>>>>>>>>>>> not. That can >>>>>>>>>>>>>>>>>>>>> explain DirectRunner resume reads. The expectation in >>>>>>>>>>>>>>>>>>>>> KafkaIO is that Kafka >>>>>>>>>>>>>>>>>>>>> client library takes care of retrying in case of >>>>>>>>>>>>>>>>>>>>> connection problems (as >>>>>>>>>>>>>>>>>>>>> documented). It is possible that in some cases poll() >>>>>>>>>>>>>>>>>>>>> throws and we need to >>>>>>>>>>>>>>>>>>>>> restart the client in KafkaIO. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> [1]: >>>>>>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java#L594 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Wed, Sep 12, 2018 at 9:59 AM Eduardo Soldera < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hi Raghu, thanks for your help. >>>>>>>>>>>>>>>>>>>>>> Just answering your previous question, the following >>>>>>>>>>>>>>>>>>>>>> logs were the same as before the error, as if the >>>>>>>>>>>>>>>>>>>>>> pipeline were still >>>>>>>>>>>>>>>>>>>>>> getting the messages, for example: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> (...) >>>>>>>>>>>>>>>>>>>>>> Resetting offset for partition >>>>>>>>>>>>>>>>>>>>>> com.arquivei.dataeng.andre-0 to offset 10. >>>>>>>>>>>>>>>>>>>>>> Resetting offset for partition >>>>>>>>>>>>>>>>>>>>>> com.arquivei.dataeng.andre-0 to offset 15. >>>>>>>>>>>>>>>>>>>>>> ERROR >>>>>>>>>>>>>>>>>>>>>> Resetting offset for partition >>>>>>>>>>>>>>>>>>>>>> com.arquivei.dataeng.andre-0 to offset 22. >>>>>>>>>>>>>>>>>>>>>> Resetting offset for partition >>>>>>>>>>>>>>>>>>>>>> com.arquivei.dataeng.andre-0 to offset 30. >>>>>>>>>>>>>>>>>>>>>> (...) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> But when checking the Kafka Consumer Group, the >>>>>>>>>>>>>>>>>>>>>> current offset stays at 15, the commited offset from the >>>>>>>>>>>>>>>>>>>>>> last processed >>>>>>>>>>>>>>>>>>>>>> message, before the error. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> We'll file a bug, but we could now reproduce the >>>>>>>>>>>>>>>>>>>>>> issue in a Dev scenario. >>>>>>>>>>>>>>>>>>>>>> We started the same pipeline using the direct runner, >>>>>>>>>>>>>>>>>>>>>> without Google Dataflow. We blocked the Kafka Broker >>>>>>>>>>>>>>>>>>>>>> network and the same >>>>>>>>>>>>>>>>>>>>>> error was thrown. Then we unblocked the network and the >>>>>>>>>>>>>>>>>>>>>> pipeline was able >>>>>>>>>>>>>>>>>>>>>> to successfully process the subsequent messages. >>>>>>>>>>>>>>>>>>>>>> When we started the same pipeline in the Dataflow >>>>>>>>>>>>>>>>>>>>>> runner and did the same test, the same problem from our >>>>>>>>>>>>>>>>>>>>>> production scenario >>>>>>>>>>>>>>>>>>>>>> happened, Dataflow couldn't process the new messages. >>>>>>>>>>>>>>>>>>>>>> Unfortunately, we've >>>>>>>>>>>>>>>>>>>>>> stopped the dataflow job in production, but the >>>>>>>>>>>>>>>>>>>>>> problematic dev job is >>>>>>>>>>>>>>>>>>>>>> still running and the log file of the VM is attached. >>>>>>>>>>>>>>>>>>>>>> Thank you very much. >>>>>>>>>>>>>>>>>>>>>> Best regards >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Em ter, 11 de set de 2018 às 18:28, Raghu Angadi < >>>>>>>>>>>>>>>>>>>>>> [email protected]> escreveu: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Specifically, I am interested if you have any thread >>>>>>>>>>>>>>>>>>>>>>> running 'consumerPollLoop()' [1]. There should always >>>>>>>>>>>>>>>>>>>>>>> be one (if a worker >>>>>>>>>>>>>>>>>>>>>>> is assigned one of the partitions). It is possible that >>>>>>>>>>>>>>>>>>>>>>> KafkaClient itself >>>>>>>>>>>>>>>>>>>>>>> is hasn't recovered from the group coordinator error >>>>>>>>>>>>>>>>>>>>>>> (though unlikely). >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java#L570 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Tue, Sep 11, 2018 at 12:31 PM Raghu Angadi < >>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hi Eduardo, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> In case of any error, the pipeline should keep on >>>>>>>>>>>>>>>>>>>>>>>> trying to fetch. I don't know about this particular >>>>>>>>>>>>>>>>>>>>>>>> error. Do you see any >>>>>>>>>>>>>>>>>>>>>>>> others afterwards in the log? >>>>>>>>>>>>>>>>>>>>>>>> Couple of things you could try if the logs are not >>>>>>>>>>>>>>>>>>>>>>>> useful : >>>>>>>>>>>>>>>>>>>>>>>> - login to one of the VMs and get stacktrace of >>>>>>>>>>>>>>>>>>>>>>>> java worker (look for a container called >>>>>>>>>>>>>>>>>>>>>>>> java-streaming) >>>>>>>>>>>>>>>>>>>>>>>> - file a support bug or stackoverflow question >>>>>>>>>>>>>>>>>>>>>>>> with jobid so that Dataflow oncall can take a look. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Raghu. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Tue, Sep 11, 2018 at 12:10 PM Eduardo Soldera < >>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>>> We have a Apache Beam pipeline running in Google >>>>>>>>>>>>>>>>>>>>>>>>> Dataflow using KafkaIO. Suddenly the pipeline stop >>>>>>>>>>>>>>>>>>>>>>>>> fetching Kafka messages >>>>>>>>>>>>>>>>>>>>>>>>> at all, as our other workers from other pipelines >>>>>>>>>>>>>>>>>>>>>>>>> continued to get Kafka >>>>>>>>>>>>>>>>>>>>>>>>> messages. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> At the moment it stopped we got these messages: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I [Consumer clientId=consumer-1, >>>>>>>>>>>>>>>>>>>>>>>>> groupId=genericPipe] Error sending fetch request >>>>>>>>>>>>>>>>>>>>>>>>> (sessionId=1396189203, epoch=2431598) to node 3: >>>>>>>>>>>>>>>>>>>>>>>>> org.apache.kafka.common.errors.DisconnectException. >>>>>>>>>>>>>>>>>>>>>>>>> I [Consumer clientId=consumer-1, >>>>>>>>>>>>>>>>>>>>>>>>> groupId=genericPipe] Group coordinator >>>>>>>>>>>>>>>>>>>>>>>>> 10.0.52.70:9093 (id: 2147483646 rack: null) is >>>>>>>>>>>>>>>>>>>>>>>>> unavailable or invalid, will attempt rediscovery >>>>>>>>>>>>>>>>>>>>>>>>> I [Consumer clientId=consumer-1, >>>>>>>>>>>>>>>>>>>>>>>>> groupId=genericPipe] Discovered group coordinator >>>>>>>>>>>>>>>>>>>>>>>>> 10.0.52.70:9093 (id: 2147483646 rack: null) >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> And then the pipeline stopped reading the messages. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> This is the KafkaIO setup we have: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> KafkaIO.read[String,String]() >>>>>>>>>>>>>>>>>>>>>>>>> .withBootstrapServers(server) >>>>>>>>>>>>>>>>>>>>>>>>> .withTopic(topic) >>>>>>>>>>>>>>>>>>>>>>>>> .withKeyDeserializer(classOf[StringDeserializer]) >>>>>>>>>>>>>>>>>>>>>>>>> .withValueDeserializer(classOf[StringDeserializer]) >>>>>>>>>>>>>>>>>>>>>>>>> .updateConsumerProperties(properties) >>>>>>>>>>>>>>>>>>>>>>>>> .commitOffsetsInFinalize() >>>>>>>>>>>>>>>>>>>>>>>>> .withoutMetadata() >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Any help will be much appreciated. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>> Eduardo Soldera Garcia >>>>>>>>>>>>>>>>>>>>>>>>> Data Engineer >>>>>>>>>>>>>>>>>>>>>>>>> (16) 3509-5555 | www.arquivei.com.br >>>>>>>>>>>>>>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura> >>>>>>>>>>>>>>>>>>>>>>>>> [image: Arquivei.com.br – Inteligência em Notas >>>>>>>>>>>>>>>>>>>>>>>>> Fiscais] >>>>>>>>>>>>>>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura> >>>>>>>>>>>>>>>>>>>>>>>>> [image: Google seleciona Arquivei para imersão e >>>>>>>>>>>>>>>>>>>>>>>>> mentoria no Vale do Silício] >>>>>>>>>>>>>>>>>>>>>>>>> <https://arquivei.com.br/blog/google-seleciona-arquivei/?utm_campaign=assinatura-email-launchpad&utm_content=assinatura-launchpad> >>>>>>>>>>>>>>>>>>>>>>>>> <https://www.facebook.com/arquivei> >>>>>>>>>>>>>>>>>>>>>>>>> <https://www.linkedin.com/company/arquivei> >>>>>>>>>>>>>>>>>>>>>>>>> <https://www.youtube.com/watch?v=sSUUKxbXnxk> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>> Eduardo Soldera Garcia >>>>>>>>>>>>>>>>>>>>>> Data Engineer >>>>>>>>>>>>>>>>>>>>>> (16) 3509-5555 | www.arquivei.com.br >>>>>>>>>>>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura> >>>>>>>>>>>>>>>>>>>>>> [image: Arquivei.com.br – Inteligência em Notas >>>>>>>>>>>>>>>>>>>>>> Fiscais] >>>>>>>>>>>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura> >>>>>>>>>>>>>>>>>>>>>> [image: Google seleciona Arquivei para imersão e >>>>>>>>>>>>>>>>>>>>>> mentoria no Vale do Silício] >>>>>>>>>>>>>>>>>>>>>> <https://arquivei.com.br/blog/google-seleciona-arquivei/?utm_campaign=assinatura-email-launchpad&utm_content=assinatura-launchpad> >>>>>>>>>>>>>>>>>>>>>> <https://www.facebook.com/arquivei> >>>>>>>>>>>>>>>>>>>>>> <https://www.linkedin.com/company/arquivei> >>>>>>>>>>>>>>>>>>>>>> <https://www.youtube.com/watch?v=sSUUKxbXnxk> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> Eduardo Soldera Garcia >>>>>>>>>>>>>>>>>>>> Data Engineer >>>>>>>>>>>>>>>>>>>> (16) 3509-5555 | www.arquivei.com.br >>>>>>>>>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura> >>>>>>>>>>>>>>>>>>>> [image: Arquivei.com.br – Inteligência em Notas Fiscais] >>>>>>>>>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura> >>>>>>>>>>>>>>>>>>>> [image: Google seleciona Arquivei para imersão e >>>>>>>>>>>>>>>>>>>> mentoria no Vale do Silício] >>>>>>>>>>>>>>>>>>>> <https://arquivei.com.br/blog/google-seleciona-arquivei/?utm_campaign=assinatura-email-launchpad&utm_content=assinatura-launchpad> >>>>>>>>>>>>>>>>>>>> <https://www.facebook.com/arquivei> >>>>>>>>>>>>>>>>>>>> <https://www.linkedin.com/company/arquivei> >>>>>>>>>>>>>>>>>>>> <https://www.youtube.com/watch?v=sSUUKxbXnxk> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Eduardo Soldera Garcia >>>>>>>>>>>>>>> Data Engineer >>>>>>>>>>>>>>> (16) 3509-5555 | www.arquivei.com.br >>>>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura> >>>>>>>>>>>>>>> [image: Arquivei.com.br – Inteligência em Notas Fiscais] >>>>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura> >>>>>>>>>>>>>>> [image: Google seleciona Arquivei para imersão e mentoria no >>>>>>>>>>>>>>> Vale do Silício] >>>>>>>>>>>>>>> <https://arquivei.com.br/blog/google-seleciona-arquivei/?utm_campaign=assinatura-email-launchpad&utm_content=assinatura-launchpad> >>>>>>>>>>>>>>> <https://www.facebook.com/arquivei> >>>>>>>>>>>>>>> <https://www.linkedin.com/company/arquivei> >>>>>>>>>>>>>>> <https://www.youtube.com/watch?v=sSUUKxbXnxk> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Eduardo Soldera Garcia >>>>>>>>>>>>> Data Engineer >>>>>>>>>>>>> (16) 3509-5555 | www.arquivei.com.br >>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura> >>>>>>>>>>>>> [image: Arquivei.com.br – Inteligência em Notas Fiscais] >>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura> >>>>>>>>>>>>> [image: Google seleciona Arquivei para imersão e mentoria no >>>>>>>>>>>>> Vale do Silício] >>>>>>>>>>>>> <https://arquivei.com.br/blog/google-seleciona-arquivei/?utm_campaign=assinatura-email-launchpad&utm_content=assinatura-launchpad> >>>>>>>>>>>>> <https://www.facebook.com/arquivei> >>>>>>>>>>>>> <https://www.linkedin.com/company/arquivei> >>>>>>>>>>>>> <https://www.youtube.com/watch?v=sSUUKxbXnxk> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Eduardo Soldera Garcia >>>>>>>>>>>> Data Engineer >>>>>>>>>>>> (16) 3509-5555 | www.arquivei.com.br >>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura> >>>>>>>>>>>> [image: Arquivei.com.br – Inteligência em Notas Fiscais] >>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura> >>>>>>>>>>>> [image: Google seleciona Arquivei para imersão e mentoria no >>>>>>>>>>>> Vale do Silício] >>>>>>>>>>>> <https://arquivei.com.br/blog/google-seleciona-arquivei/?utm_campaign=assinatura-email-launchpad&utm_content=assinatura-launchpad> >>>>>>>>>>>> <https://www.facebook.com/arquivei> >>>>>>>>>>>> <https://www.linkedin.com/company/arquivei> >>>>>>>>>>>> <https://www.youtube.com/watch?v=sSUUKxbXnxk> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Eduardo Soldera Garcia >>>>>>>>>> Data Engineer >>>>>>>>>> (16) 3509-5555 | www.arquivei.com.br >>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura> >>>>>>>>>> [image: Arquivei.com.br – Inteligência em Notas Fiscais] >>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura> >>>>>>>>>> [image: Google seleciona Arquivei para imersão e mentoria no Vale >>>>>>>>>> do Silício] >>>>>>>>>> <https://arquivei.com.br/blog/google-seleciona-arquivei/?utm_campaign=assinatura-email-launchpad&utm_content=assinatura-launchpad> >>>>>>>>>> <https://www.facebook.com/arquivei> >>>>>>>>>> <https://www.linkedin.com/company/arquivei> >>>>>>>>>> <https://www.youtube.com/watch?v=sSUUKxbXnxk> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Eduardo Soldera Garcia >>>>>>>> Data Engineer >>>>>>>> (16) 3509-5555 | www.arquivei.com.br >>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura> >>>>>>>> [image: Arquivei.com.br – Inteligência em Notas Fiscais] >>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura> >>>>>>>> [image: Google seleciona Arquivei para imersão e mentoria no Vale >>>>>>>> do Silício] >>>>>>>> <https://arquivei.com.br/blog/google-seleciona-arquivei/?utm_campaign=assinatura-email-launchpad&utm_content=assinatura-launchpad> >>>>>>>> <https://www.facebook.com/arquivei> >>>>>>>> <https://www.linkedin.com/company/arquivei> >>>>>>>> <https://www.youtube.com/watch?v=sSUUKxbXnxk> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> JC >>>>>>> >>>>>>> >> >> -- >> Eduardo Soldera Garcia >> Data Engineer >> (16) 3509-5555 | www.arquivei.com.br >> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura> >> [image: Arquivei.com.br – Inteligência em Notas Fiscais] >> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura> >> [image: Google seleciona Arquivei para imersão e mentoria no Vale do >> Silício] >> <https://arquivei.com.br/blog/google-seleciona-arquivei/?utm_campaign=assinatura-email-launchpad&utm_content=assinatura-launchpad> >> <https://www.facebook.com/arquivei> >> <https://www.linkedin.com/company/arquivei> >> <https://www.youtube.com/watch?v=sSUUKxbXnxk> >> > -- Eduardo Soldera Garcia Data Engineer (16) 3509-5555 | www.arquivei.com.br <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura> [image: Arquivei.com.br – Inteligência em Notas Fiscais] <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura> [image: Google seleciona Arquivei para imersão e mentoria no Vale do Silício] <https://arquivei.com.br/blog/google-seleciona-arquivei/?utm_campaign=assinatura-email-launchpad&utm_content=assinatura-launchpad> <https://www.facebook.com/arquivei> <https://www.linkedin.com/company/arquivei> <https://www.youtube.com/watch?v=sSUUKxbXnxk>
