Hi Raghu, Dataflow throws an exception if Kafka fails now and it recovers
after Kafka is available.

Regards

Em sex, 19 de out de 2018 às 14:01, Raghu Angadi <[email protected]>
escreveu:

> On Fri, Oct 19, 2018 at 6:54 AM Eduardo Soldera <
> [email protected]> wrote:
>
>> Hi Raghu, just a quick update. We were waiting for Spotify's Scio to
>> update to Beam 2.7. We've just deployed the pipeling sucessfully. Just for
>> letting you know, I tried to use the workaround code snipped, but Dataflow
>> wouldn't recover after a Kafka unavailability.
>>
>
> Thanks for the update. The workaround helps only if KafkaClient itself can
> recover when try to read again. I guess some of those exceptions are are
> not recoverable.
>
> Please let us know how the actual fix works.
>
> Thanks.
> Raghu.
>
>
>>
>> Thanks for your help.
>>
>> Regards
>>
>> Em qua, 19 de set de 2018 às 15:37, Raghu Angadi <[email protected]>
>> escreveu:
>>
>>>
>>>
>>> On Wed, Sep 19, 2018 at 11:24 AM Juan Carlos Garcia <[email protected]>
>>> wrote:
>>>
>>>> Sorry I hit the send button to fast... The error occurs in the worker.
>>>>
>>>
>>> Np. Just one more comment on it: it is a very important
>>> design/correctness decision to for runner to decide how to handle
>>> persistent errors in a streaming pipeline. Dataflow keeps failing since
>>> there is no solution to restart a pipeline from scratch without losing
>>> exactly-once guarantees. It lets user decide if the pipeline needs to be
>>> 'upgraded'.
>>>
>>> Raghu.
>>>
>>>>
>>>> Juan Carlos Garcia <[email protected]> schrieb am Mi., 19. Sep.
>>>> 2018, 20:22:
>>>>
>>>>> Sorry for hijacking the thread, we are running Spark on top of Yarn,
>>>>> yarn retries multiple times until it reachs it max attempt and then gives
>>>>> up.
>>>>>
>>>>> Raghu Angadi <[email protected]> schrieb am Mi., 19. Sep. 2018,
>>>>> 18:58:
>>>>>
>>>>>> On Wed, Sep 19, 2018 at 7:26 AM Juan Carlos Garcia <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Don't know if its related, but we have seen our pipeline dying
>>>>>>> (using SparkRunner) when there is problem with Kafka  (network
>>>>>>> interruptions), errors like:
>>>>>>> org.apache.kafka.common.errors.TimeoutException: Timeout expired while
>>>>>>> fetching topic metadata
>>>>>>>
>>>>>>> Maybe this will fix it as well, thanks Raghu for the hint about
>>>>>>> *withConsumerFactoryFn.*
>>>>>>>
>>>>>>
>>>>>> Wouldn't that be retried by the SparkRunner if it happens on the
>>>>>> worker? or does it happen while launching the pipeline on the client?
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Sep 19, 2018 at 3:29 PM Eduardo Soldera <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Raghu, thank you.
>>>>>>>>
>>>>>>>> I'm not sure though what to pass as an argument:
>>>>>>>>
>>>>>>>> KafkaIO.read[String,String]()
>>>>>>>>   .withBootstrapServers(server)
>>>>>>>>   .withTopic(topic)
>>>>>>>>   .withKeyDeserializer(classOf[StringDeserializer])
>>>>>>>>   .withValueDeserializer(classOf[StringDeserializer])
>>>>>>>>   .withConsumerFactoryFn(new 
>>>>>>>> KafkaExecutor.ConsumerFactoryFn(????????????????))
>>>>>>>>   .updateConsumerProperties(properties)
>>>>>>>>   .commitOffsetsInFinalize()
>>>>>>>>   .withoutMetadata()
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>>
>>>>>>>> Em ter, 18 de set de 2018 às 21:15, Raghu Angadi <
>>>>>>>> [email protected]> escreveu:
>>>>>>>>
>>>>>>>>> Hi Eduardo,
>>>>>>>>>
>>>>>>>>> There another work around you can try without having to wait for
>>>>>>>>> 2.7.0 release: Use a wrapper to catch exception from 
>>>>>>>>> KafkaConsumer#poll()
>>>>>>>>> and pass the wrapper to withConsumerFactoryFn() for KafkIO reader [1].
>>>>>>>>>
>>>>>>>>> Using something like (such a wrapper is used in KafkasIO tests
>>>>>>>>> [2]):
>>>>>>>>> private static class ConsumerFactoryFn
>>>>>>>>>                 implements SerializableFunction<Map<String,
>>>>>>>>> Object>, Consumer<byte[], byte[]>> {
>>>>>>>>>   @Override
>>>>>>>>>     public Consumer<byte[], byte[]> apply(Map<String, Object>
>>>>>>>>> config) {
>>>>>>>>>       return new KafkaConsumer(config) {
>>>>>>>>>           @Override
>>>>>>>>>           public ConsumerRecords<K, V> poll(long timeout) {
>>>>>>>>>               // work around for BEAM-5375
>>>>>>>>>               while (true) {
>>>>>>>>>                   try {
>>>>>>>>>                     return super.poll(timeout);
>>>>>>>>>                  } catch (Exception e) {
>>>>>>>>>                     // LOG & sleep for sec
>>>>>>>>>                 }
>>>>>>>>>           }
>>>>>>>>>        }
>>>>>>>>>     }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> [1]:
>>>>>>>>> https://github.com/apache/beam/blob/release-2.4.0/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L417
>>>>>>>>> [2]:
>>>>>>>>> https://github.com/apache/beam/blob/release-2.4.0/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOTest.java#L261
>>>>>>>>>
>>>>>>>>> On Tue, Sep 18, 2018 at 5:49 AM Eduardo Soldera <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Raghu, we're not sure how long the network was down. According
>>>>>>>>>> to the logs no longer than one minute. A 30 second shutdown would 
>>>>>>>>>> work for
>>>>>>>>>> the tests.
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>>
>>>>>>>>>> Em sex, 14 de set de 2018 às 21:41, Raghu Angadi <
>>>>>>>>>> [email protected]> escreveu:
>>>>>>>>>>
>>>>>>>>>>> Thanks. I could repro myself as well. How long was the network
>>>>>>>>>>> down?
>>>>>>>>>>>
>>>>>>>>>>> Trying to get the fix into 2.7 RC2.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Sep 14, 2018 at 12:25 PM Eduardo Soldera <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Just to make myself clear, I'm not sure how to use the patch
>>>>>>>>>>>> but if you could send us some guidance would be great.
>>>>>>>>>>>>
>>>>>>>>>>>> Em sex, 14 de set de 2018 às 16:24, Eduardo Soldera <
>>>>>>>>>>>> [email protected]> escreveu:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Raghu, yes, it is feasible, would you do that for us? I'm
>>>>>>>>>>>>> not sure how we'd use the patch. We're using SBT and Spotify's 
>>>>>>>>>>>>> Scio with
>>>>>>>>>>>>> Scala.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>
>>>>>>>>>>>>> Em sex, 14 de set de 2018 às 16:07, Raghu Angadi <
>>>>>>>>>>>>> [email protected]> escreveu:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is is feasible for you to verify the fix in your dev job? I
>>>>>>>>>>>>>> can make a patch against Beam 2.4 branch if you like.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Raghu.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Sep 14, 2018 at 11:14 AM Eduardo Soldera <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Raghu, thank you very much for the pull request.
>>>>>>>>>>>>>>> We'll wait for the 2.7 Beam release.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Em qui, 13 de set de 2018 às 18:19, Raghu Angadi <
>>>>>>>>>>>>>>> [email protected]> escreveu:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Fix: https://github.com/apache/beam/pull/6391
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Sep 12, 2018 at 3:30 PM Raghu Angadi <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Filed BEAM-5375
>>>>>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-5375>. I will
>>>>>>>>>>>>>>>>> fix it later this week.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Sep 12, 2018 at 12:16 PM Raghu Angadi <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Sep 12, 2018 at 12:11 PM Raghu Angadi <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks for the job id, I looked at the worker logs
>>>>>>>>>>>>>>>>>>> (following usual support oncall access protocol that 
>>>>>>>>>>>>>>>>>>> provides temporary
>>>>>>>>>>>>>>>>>>> access to things like logs in GCP):
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Root issue looks like consumerPollLoop() mentioned
>>>>>>>>>>>>>>>>>>> earlier needs to handle unchecked exception. In your case 
>>>>>>>>>>>>>>>>>>> it is clear that
>>>>>>>>>>>>>>>>>>> poll thread exited with a runtime exception. The reader 
>>>>>>>>>>>>>>>>>>> does not check for
>>>>>>>>>>>>>>>>>>> it and continues to wait for poll thread to enqueue 
>>>>>>>>>>>>>>>>>>> messages. A fix should
>>>>>>>>>>>>>>>>>>> result in an IOException for read from the source. The 
>>>>>>>>>>>>>>>>>>> runners will handle
>>>>>>>>>>>>>>>>>>> that appropriately after that.  I will file a jira.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java#L678
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Ignore the link.. was pasted here by mistake.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> From the logs (with a comment below each one):
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    - 2018-09-12 06:13:07.345 PDT Reader-0: reading from
>>>>>>>>>>>>>>>>>>>    kafka_topic-0 starting at offset 2
>>>>>>>>>>>>>>>>>>>       - Implies the reader is initialized and poll
>>>>>>>>>>>>>>>>>>>       thread is started.
>>>>>>>>>>>>>>>>>>>    - 2018-09-12 06:13:07.780 PDT Reader-0: first record
>>>>>>>>>>>>>>>>>>>    offset 2
>>>>>>>>>>>>>>>>>>>       - The reader actually got a message received by
>>>>>>>>>>>>>>>>>>>       the poll thread from Kafka.
>>>>>>>>>>>>>>>>>>>    - 2018-09-12 06:16:48.771 PDT Reader-0: exception
>>>>>>>>>>>>>>>>>>>    while fetching latest offset for partition 
>>>>>>>>>>>>>>>>>>> kafka_topic-0. will be retried.
>>>>>>>>>>>>>>>>>>>       - This must have happened around the time when
>>>>>>>>>>>>>>>>>>>       network was disrupted. This is from. Actual log is 
>>>>>>>>>>>>>>>>>>> from another periodic
>>>>>>>>>>>>>>>>>>>       task that fetches latest offsets for partitions.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The poll thread must have died around the time network
>>>>>>>>>>>>>>>>>>> was disrupted.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The following log comes from kafka client itself and is
>>>>>>>>>>>>>>>>>>> printed every second when KafkaIO fetches latest offset. 
>>>>>>>>>>>>>>>>>>> This log seems to
>>>>>>>>>>>>>>>>>>> be added in recent versions. It is probably an 
>>>>>>>>>>>>>>>>>>> unintentional log. I don't
>>>>>>>>>>>>>>>>>>> think there is any better to fetch latest offsets than how 
>>>>>>>>>>>>>>>>>>> KafkaIO does
>>>>>>>>>>>>>>>>>>> now. This is logged inside consumer.position() called at 
>>>>>>>>>>>>>>>>>>> [1].
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    - 2018-09-12 06:13:11.786 PDT [Consumer
>>>>>>>>>>>>>>>>>>>    clientId=consumer-2,
>>>>>>>>>>>>>>>>>>>    groupId=Reader-0_offset_consumer_1735388161_genericPipe] 
>>>>>>>>>>>>>>>>>>> Resetting offset
>>>>>>>>>>>>>>>>>>>    for partition com.arquivei.dataeng.andre-0 to offset 3.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> [1]:
>>>>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java#L678
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This 'Resetting offset' is harmless, but is quite
>>>>>>>>>>>>>>>>>> annoying to see in the worker logs. One way to avoid is to 
>>>>>>>>>>>>>>>>>> set kafka
>>>>>>>>>>>>>>>>>> consumer's log level to WARNING. Ideally KafkaIO itself 
>>>>>>>>>>>>>>>>>> should do something
>>>>>>>>>>>>>>>>>> to avoid it without user option.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, Sep 12, 2018 at 10:27 AM Eduardo Soldera <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi Raghu! The job_id of our dev job is
>>>>>>>>>>>>>>>>>>>> 2018-09-12_06_11_48-5600553605191377866.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Em qua, 12 de set de 2018 às 14:18, Raghu Angadi <
>>>>>>>>>>>>>>>>>>>> [email protected]> escreveu:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks for debugging.
>>>>>>>>>>>>>>>>>>>>> Can you provide the job_id of your dev job? The
>>>>>>>>>>>>>>>>>>>>> stacktrace shows that there is no thread running 
>>>>>>>>>>>>>>>>>>>>> 'consumerPollLoop()' which
>>>>>>>>>>>>>>>>>>>>> can explain stuck reader. You will likely find a logs at 
>>>>>>>>>>>>>>>>>>>>> line 594 & 587
>>>>>>>>>>>>>>>>>>>>> [1].  Dataflow caches its readers and DirectRunner may 
>>>>>>>>>>>>>>>>>>>>> not. That can
>>>>>>>>>>>>>>>>>>>>> explain DirectRunner resume reads. The expectation in 
>>>>>>>>>>>>>>>>>>>>> KafkaIO is that Kafka
>>>>>>>>>>>>>>>>>>>>> client library takes care of retrying in case of 
>>>>>>>>>>>>>>>>>>>>> connection problems (as
>>>>>>>>>>>>>>>>>>>>> documented). It is possible that in some cases poll() 
>>>>>>>>>>>>>>>>>>>>> throws and we need to
>>>>>>>>>>>>>>>>>>>>> restart the client in KafkaIO.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> [1]:
>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java#L594
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, Sep 12, 2018 at 9:59 AM Eduardo Soldera <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi Raghu, thanks for your help.
>>>>>>>>>>>>>>>>>>>>>> Just answering your previous question, the following
>>>>>>>>>>>>>>>>>>>>>> logs were the same as before the error, as if the 
>>>>>>>>>>>>>>>>>>>>>> pipeline were still
>>>>>>>>>>>>>>>>>>>>>> getting the messages, for example:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>>>>>>>>>> Resetting offset for partition
>>>>>>>>>>>>>>>>>>>>>> com.arquivei.dataeng.andre-0 to offset 10.
>>>>>>>>>>>>>>>>>>>>>> Resetting offset for partition
>>>>>>>>>>>>>>>>>>>>>> com.arquivei.dataeng.andre-0 to offset 15.
>>>>>>>>>>>>>>>>>>>>>> ERROR
>>>>>>>>>>>>>>>>>>>>>> Resetting offset for partition
>>>>>>>>>>>>>>>>>>>>>> com.arquivei.dataeng.andre-0 to offset 22.
>>>>>>>>>>>>>>>>>>>>>> Resetting offset for partition
>>>>>>>>>>>>>>>>>>>>>> com.arquivei.dataeng.andre-0 to offset 30.
>>>>>>>>>>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> But when checking the Kafka Consumer Group, the
>>>>>>>>>>>>>>>>>>>>>> current offset stays at 15, the commited offset from the 
>>>>>>>>>>>>>>>>>>>>>> last processed
>>>>>>>>>>>>>>>>>>>>>> message, before the error.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> We'll file a bug, but we could now reproduce the
>>>>>>>>>>>>>>>>>>>>>> issue in a Dev scenario.
>>>>>>>>>>>>>>>>>>>>>> We started the same pipeline using the direct runner,
>>>>>>>>>>>>>>>>>>>>>> without Google Dataflow. We blocked the Kafka Broker 
>>>>>>>>>>>>>>>>>>>>>> network and the same
>>>>>>>>>>>>>>>>>>>>>> error was thrown. Then we unblocked the network and the 
>>>>>>>>>>>>>>>>>>>>>> pipeline was able
>>>>>>>>>>>>>>>>>>>>>> to successfully process the subsequent messages.
>>>>>>>>>>>>>>>>>>>>>> When we started the same pipeline in the Dataflow
>>>>>>>>>>>>>>>>>>>>>> runner and did the same test, the same problem from our 
>>>>>>>>>>>>>>>>>>>>>> production scenario
>>>>>>>>>>>>>>>>>>>>>> happened, Dataflow couldn't process the new messages. 
>>>>>>>>>>>>>>>>>>>>>> Unfortunately, we've
>>>>>>>>>>>>>>>>>>>>>> stopped the dataflow job in production, but the 
>>>>>>>>>>>>>>>>>>>>>> problematic dev job is
>>>>>>>>>>>>>>>>>>>>>> still running and the log file of the VM is attached. 
>>>>>>>>>>>>>>>>>>>>>> Thank you very much.
>>>>>>>>>>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Em ter, 11 de set de 2018 às 18:28, Raghu Angadi <
>>>>>>>>>>>>>>>>>>>>>> [email protected]> escreveu:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Specifically, I am interested if you have any thread
>>>>>>>>>>>>>>>>>>>>>>> running 'consumerPollLoop()' [1]. There should always 
>>>>>>>>>>>>>>>>>>>>>>> be one (if a worker
>>>>>>>>>>>>>>>>>>>>>>> is assigned one of the partitions). It is possible that 
>>>>>>>>>>>>>>>>>>>>>>> KafkaClient itself
>>>>>>>>>>>>>>>>>>>>>>> is hasn't recovered from the group coordinator error 
>>>>>>>>>>>>>>>>>>>>>>> (though unlikely).
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java#L570
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Tue, Sep 11, 2018 at 12:31 PM Raghu Angadi <
>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hi Eduardo,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> In case of any error, the pipeline should keep on
>>>>>>>>>>>>>>>>>>>>>>>> trying to fetch. I don't know about this particular 
>>>>>>>>>>>>>>>>>>>>>>>> error. Do you see any
>>>>>>>>>>>>>>>>>>>>>>>> others afterwards in the log?
>>>>>>>>>>>>>>>>>>>>>>>> Couple of things you could try if the logs are not
>>>>>>>>>>>>>>>>>>>>>>>> useful :
>>>>>>>>>>>>>>>>>>>>>>>>  - login to one of the VMs and get stacktrace of
>>>>>>>>>>>>>>>>>>>>>>>> java worker (look for a container called 
>>>>>>>>>>>>>>>>>>>>>>>> java-streaming)
>>>>>>>>>>>>>>>>>>>>>>>>  - file a support bug or stackoverflow question
>>>>>>>>>>>>>>>>>>>>>>>> with jobid so that Dataflow oncall can take a look.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Raghu.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Sep 11, 2018 at 12:10 PM Eduardo Soldera <
>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>> We have a Apache Beam pipeline running in Google
>>>>>>>>>>>>>>>>>>>>>>>>> Dataflow using KafkaIO. Suddenly the pipeline stop 
>>>>>>>>>>>>>>>>>>>>>>>>> fetching Kafka messages
>>>>>>>>>>>>>>>>>>>>>>>>> at all, as our other workers from other pipelines 
>>>>>>>>>>>>>>>>>>>>>>>>> continued to get Kafka
>>>>>>>>>>>>>>>>>>>>>>>>> messages.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> At the moment it stopped we got these messages:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I  [Consumer clientId=consumer-1, 
>>>>>>>>>>>>>>>>>>>>>>>>> groupId=genericPipe] Error sending fetch request 
>>>>>>>>>>>>>>>>>>>>>>>>> (sessionId=1396189203, epoch=2431598) to node 3: 
>>>>>>>>>>>>>>>>>>>>>>>>> org.apache.kafka.common.errors.DisconnectException.
>>>>>>>>>>>>>>>>>>>>>>>>> I  [Consumer clientId=consumer-1, 
>>>>>>>>>>>>>>>>>>>>>>>>> groupId=genericPipe] Group coordinator 
>>>>>>>>>>>>>>>>>>>>>>>>> 10.0.52.70:9093 (id: 2147483646 rack: null) is 
>>>>>>>>>>>>>>>>>>>>>>>>> unavailable or invalid, will attempt rediscovery
>>>>>>>>>>>>>>>>>>>>>>>>> I  [Consumer clientId=consumer-1, 
>>>>>>>>>>>>>>>>>>>>>>>>> groupId=genericPipe] Discovered group coordinator 
>>>>>>>>>>>>>>>>>>>>>>>>> 10.0.52.70:9093 (id: 2147483646 rack: null)
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> And then the pipeline stopped reading the messages.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> This is the KafkaIO setup  we have:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> KafkaIO.read[String,String]()
>>>>>>>>>>>>>>>>>>>>>>>>>   .withBootstrapServers(server)
>>>>>>>>>>>>>>>>>>>>>>>>>   .withTopic(topic)
>>>>>>>>>>>>>>>>>>>>>>>>>   .withKeyDeserializer(classOf[StringDeserializer])
>>>>>>>>>>>>>>>>>>>>>>>>>   .withValueDeserializer(classOf[StringDeserializer])
>>>>>>>>>>>>>>>>>>>>>>>>>   .updateConsumerProperties(properties)
>>>>>>>>>>>>>>>>>>>>>>>>>   .commitOffsetsInFinalize()
>>>>>>>>>>>>>>>>>>>>>>>>>   .withoutMetadata()
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>  Any help will be much appreciated.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>> Eduardo Soldera Garcia
>>>>>>>>>>>>>>>>>>>>>>>>> Data Engineer
>>>>>>>>>>>>>>>>>>>>>>>>> (16) 3509-5555 | www.arquivei.com.br
>>>>>>>>>>>>>>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura>
>>>>>>>>>>>>>>>>>>>>>>>>> [image: Arquivei.com.br – Inteligência em Notas
>>>>>>>>>>>>>>>>>>>>>>>>> Fiscais]
>>>>>>>>>>>>>>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura>
>>>>>>>>>>>>>>>>>>>>>>>>> [image: Google seleciona Arquivei para imersão e
>>>>>>>>>>>>>>>>>>>>>>>>> mentoria no Vale do Silício]
>>>>>>>>>>>>>>>>>>>>>>>>> <https://arquivei.com.br/blog/google-seleciona-arquivei/?utm_campaign=assinatura-email-launchpad&utm_content=assinatura-launchpad>
>>>>>>>>>>>>>>>>>>>>>>>>> <https://www.facebook.com/arquivei>
>>>>>>>>>>>>>>>>>>>>>>>>> <https://www.linkedin.com/company/arquivei>
>>>>>>>>>>>>>>>>>>>>>>>>> <https://www.youtube.com/watch?v=sSUUKxbXnxk>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> Eduardo Soldera Garcia
>>>>>>>>>>>>>>>>>>>>>> Data Engineer
>>>>>>>>>>>>>>>>>>>>>> (16) 3509-5555 | www.arquivei.com.br
>>>>>>>>>>>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura>
>>>>>>>>>>>>>>>>>>>>>> [image: Arquivei.com.br – Inteligência em Notas
>>>>>>>>>>>>>>>>>>>>>> Fiscais]
>>>>>>>>>>>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura>
>>>>>>>>>>>>>>>>>>>>>> [image: Google seleciona Arquivei para imersão e
>>>>>>>>>>>>>>>>>>>>>> mentoria no Vale do Silício]
>>>>>>>>>>>>>>>>>>>>>> <https://arquivei.com.br/blog/google-seleciona-arquivei/?utm_campaign=assinatura-email-launchpad&utm_content=assinatura-launchpad>
>>>>>>>>>>>>>>>>>>>>>> <https://www.facebook.com/arquivei>
>>>>>>>>>>>>>>>>>>>>>> <https://www.linkedin.com/company/arquivei>
>>>>>>>>>>>>>>>>>>>>>> <https://www.youtube.com/watch?v=sSUUKxbXnxk>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Eduardo Soldera Garcia
>>>>>>>>>>>>>>>>>>>> Data Engineer
>>>>>>>>>>>>>>>>>>>> (16) 3509-5555 | www.arquivei.com.br
>>>>>>>>>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura>
>>>>>>>>>>>>>>>>>>>> [image: Arquivei.com.br – Inteligência em Notas Fiscais]
>>>>>>>>>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura>
>>>>>>>>>>>>>>>>>>>> [image: Google seleciona Arquivei para imersão e
>>>>>>>>>>>>>>>>>>>> mentoria no Vale do Silício]
>>>>>>>>>>>>>>>>>>>> <https://arquivei.com.br/blog/google-seleciona-arquivei/?utm_campaign=assinatura-email-launchpad&utm_content=assinatura-launchpad>
>>>>>>>>>>>>>>>>>>>> <https://www.facebook.com/arquivei>
>>>>>>>>>>>>>>>>>>>> <https://www.linkedin.com/company/arquivei>
>>>>>>>>>>>>>>>>>>>> <https://www.youtube.com/watch?v=sSUUKxbXnxk>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Eduardo Soldera Garcia
>>>>>>>>>>>>>>> Data Engineer
>>>>>>>>>>>>>>> (16) 3509-5555 | www.arquivei.com.br
>>>>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura>
>>>>>>>>>>>>>>> [image: Arquivei.com.br – Inteligência em Notas Fiscais]
>>>>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura>
>>>>>>>>>>>>>>> [image: Google seleciona Arquivei para imersão e mentoria no
>>>>>>>>>>>>>>> Vale do Silício]
>>>>>>>>>>>>>>> <https://arquivei.com.br/blog/google-seleciona-arquivei/?utm_campaign=assinatura-email-launchpad&utm_content=assinatura-launchpad>
>>>>>>>>>>>>>>> <https://www.facebook.com/arquivei>
>>>>>>>>>>>>>>> <https://www.linkedin.com/company/arquivei>
>>>>>>>>>>>>>>> <https://www.youtube.com/watch?v=sSUUKxbXnxk>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Eduardo Soldera Garcia
>>>>>>>>>>>>> Data Engineer
>>>>>>>>>>>>> (16) 3509-5555 | www.arquivei.com.br
>>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura>
>>>>>>>>>>>>> [image: Arquivei.com.br – Inteligência em Notas Fiscais]
>>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura>
>>>>>>>>>>>>> [image: Google seleciona Arquivei para imersão e mentoria no
>>>>>>>>>>>>> Vale do Silício]
>>>>>>>>>>>>> <https://arquivei.com.br/blog/google-seleciona-arquivei/?utm_campaign=assinatura-email-launchpad&utm_content=assinatura-launchpad>
>>>>>>>>>>>>> <https://www.facebook.com/arquivei>
>>>>>>>>>>>>> <https://www.linkedin.com/company/arquivei>
>>>>>>>>>>>>> <https://www.youtube.com/watch?v=sSUUKxbXnxk>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Eduardo Soldera Garcia
>>>>>>>>>>>> Data Engineer
>>>>>>>>>>>> (16) 3509-5555 | www.arquivei.com.br
>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura>
>>>>>>>>>>>> [image: Arquivei.com.br – Inteligência em Notas Fiscais]
>>>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura>
>>>>>>>>>>>> [image: Google seleciona Arquivei para imersão e mentoria no
>>>>>>>>>>>> Vale do Silício]
>>>>>>>>>>>> <https://arquivei.com.br/blog/google-seleciona-arquivei/?utm_campaign=assinatura-email-launchpad&utm_content=assinatura-launchpad>
>>>>>>>>>>>> <https://www.facebook.com/arquivei>
>>>>>>>>>>>> <https://www.linkedin.com/company/arquivei>
>>>>>>>>>>>> <https://www.youtube.com/watch?v=sSUUKxbXnxk>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Eduardo Soldera Garcia
>>>>>>>>>> Data Engineer
>>>>>>>>>> (16) 3509-5555 | www.arquivei.com.br
>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura>
>>>>>>>>>> [image: Arquivei.com.br – Inteligência em Notas Fiscais]
>>>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura>
>>>>>>>>>> [image: Google seleciona Arquivei para imersão e mentoria no Vale
>>>>>>>>>> do Silício]
>>>>>>>>>> <https://arquivei.com.br/blog/google-seleciona-arquivei/?utm_campaign=assinatura-email-launchpad&utm_content=assinatura-launchpad>
>>>>>>>>>> <https://www.facebook.com/arquivei>
>>>>>>>>>> <https://www.linkedin.com/company/arquivei>
>>>>>>>>>> <https://www.youtube.com/watch?v=sSUUKxbXnxk>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Eduardo Soldera Garcia
>>>>>>>> Data Engineer
>>>>>>>> (16) 3509-5555 | www.arquivei.com.br
>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura>
>>>>>>>> [image: Arquivei.com.br – Inteligência em Notas Fiscais]
>>>>>>>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura>
>>>>>>>> [image: Google seleciona Arquivei para imersão e mentoria no Vale
>>>>>>>> do Silício]
>>>>>>>> <https://arquivei.com.br/blog/google-seleciona-arquivei/?utm_campaign=assinatura-email-launchpad&utm_content=assinatura-launchpad>
>>>>>>>> <https://www.facebook.com/arquivei>
>>>>>>>> <https://www.linkedin.com/company/arquivei>
>>>>>>>> <https://www.youtube.com/watch?v=sSUUKxbXnxk>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> JC
>>>>>>>
>>>>>>>
>>
>> --
>> Eduardo Soldera Garcia
>> Data Engineer
>> (16) 3509-5555 | www.arquivei.com.br
>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura>
>> [image: Arquivei.com.br – Inteligência em Notas Fiscais]
>> <https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura>
>> [image: Google seleciona Arquivei para imersão e mentoria no Vale do
>> Silício]
>> <https://arquivei.com.br/blog/google-seleciona-arquivei/?utm_campaign=assinatura-email-launchpad&utm_content=assinatura-launchpad>
>> <https://www.facebook.com/arquivei>
>> <https://www.linkedin.com/company/arquivei>
>> <https://www.youtube.com/watch?v=sSUUKxbXnxk>
>>
>

-- 
Eduardo Soldera Garcia
Data Engineer
(16) 3509-5555 | www.arquivei.com.br
<https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura>
[image: Arquivei.com.br – Inteligência em Notas Fiscais]
<https://arquivei.com.br/?utm_campaign=assinatura-email&utm_content=assinatura>
[image: Google seleciona Arquivei para imersão e mentoria no Vale do
Silício]
<https://arquivei.com.br/blog/google-seleciona-arquivei/?utm_campaign=assinatura-email-launchpad&utm_content=assinatura-launchpad>
<https://www.facebook.com/arquivei>
<https://www.linkedin.com/company/arquivei>
<https://www.youtube.com/watch?v=sSUUKxbXnxk>

Reply via email to