Re: Avoiding data shuffling when reading pre-partitioned data from Kafka

2023-03-06 Thread Tommy May
site key generation is deterministic, you can do the same >> thing on both streams, and join on the composite key. >> >> You’d want to cache the mapping from the real key to the synthetic value, >> to avoid doing this calculation for every record. >> >> If that sounds

Re: Avoiding data shuffling when reading pre-partitioned data from Kafka

2023-03-04 Thread Tommy May
teful join. > > If it’s something like a left outer join without any state TTL or need to > keep both sides in state, then it’s pretty easy. > > — Ken > > PS - it’s pretty easy to figure out a “-xxx” value to append to a topic > name to get the hashCode() result you need. >

Avoiding data shuffling when reading pre-partitioned data from Kafka

2023-03-03 Thread Tommy May
Hello, My team has a Flink streaming job that does a stateful join across two high throughput kafka topics. This results in a large amount of data ser/de and shuffling (about 1gb/s for context). We're running into a bottleneck on this shuffling step. We've attempted to optimize our flink configura