gaborgsomogyi opened a new pull request #26470: [SPARK-27042][SS] Invalidate 
cached Kafka producer in case of task retry
URL: https://github.com/apache/spark/pull/26470
 
 
   ### What changes were proposed in this pull request?
   If a task is failing due to a corrupt cached Kafka producer and the task is 
retried in the same executor, then the task may get the same producer over and 
over again. After several retries the query may stop.
   
   In this PR I'm invalidating the old cached producers for a specific key and 
re-opening a new one. This will reduce the possibility of a faulty producer 
re-use. It must be mentioned if a producer under the key is used by another 
task, then in won't be closed. The functionality is similar to the consumer 
side 
[here](https://github.com/apache/spark/blob/d06a9cc4bdcc1fb330f50daf219e9e8d908a16c4/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala#L628).
   
   ### Why are the changes needed?
   Increase producer side availability.
   
   ### Does this PR introduce any user-facing change?
   No.
   
   ### How was this patch tested?
   Existing + additional unit tests.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to