Hey all, I’ve been taking a very TDD-oriented approach to developing many of the Flink apps I’ve worked on, but recently I’ve encountered a problem that has me scratching my head.
A majority of my integration tests leverage a few external technologies such as Kafka and typically a relational database like Postgres. I’ve found in-memory/embedded versions of these that have worked well in the past to allow me to: - send messages into a kafka topic - run my exact Flink job asynchronously - verify my results / assertions in Postgres via awaitility Recently, I had a use case for Broadcast state for a job and found that my tests would run successfully when executed directly but multiple tests run in sequence (in the same file), it seems that Flink would fail to consume from the topics and eventually fail the assertion. I’ve tried several approaches including: - ensuring that each Flink job is passed a unique consumer.id / group.id / application.id - ensuring each test has brand new Kafka topics specific for it - spinning up a new Flink cluster / Kafka cluster / Postgres instance per test I’m not entirely sure what could be causing the problem but it only occurs for Flink jobs that read from two topics and leverage broadcast state. All other integration tests that use Kafka/Flink/Postgres still pass and can be run in sequence. Any advice / examples / recommendations would be helpful. l’d be happy to elaborate and provide code whenever possible as well. Thanks, Rion