Hey all,

I’ve been taking a very TDD-oriented approach to developing many of the Flink 
apps I’ve worked on, but recently I’ve encountered a problem that has me 
scratching my head.

A majority of my integration tests leverage a few external technologies such as 
Kafka and typically a relational database like Postgres. I’ve found 
in-memory/embedded versions of these that have worked well in the past to allow 
me to:

- send messages into a kafka topic
- run my exact Flink job asynchronously 
- verify my results / assertions in Postgres via awaitility

Recently, I had a use case for Broadcast state for a job and found that my 
tests would run successfully when executed directly but multiple tests run in 
sequence (in the same file), it seems that Flink would fail to consume from the 
topics and eventually fail the assertion. 

I’ve tried several approaches including:
- ensuring that each Flink job is passed a unique consumer.id / group.id / 
application.id
- ensuring each test has brand new Kafka topics specific for it
- spinning up a new Flink cluster / Kafka cluster / Postgres instance per test

I’m not entirely sure what could be causing the problem but it only occurs for 
Flink jobs that read from two topics and leverage broadcast state. All other 
integration tests that use Kafka/Flink/Postgres still pass and can be run in 
sequence.

Any advice / examples / recommendations would be helpful. l’d be happy to 
elaborate and provide code whenever possible as well.

Thanks,

Rion

Reply via email to