How do I avoid unnecessary reshuffles when using Kafka as input? My keys in Kafka are ~userId. The first few stages do joins that are usually (userId, someOtherKeyId). It makes sense for these joins to stay on the same machine and avoid unnecessary shuffling.
What's the best way to avoid unnecessary shuffling when using Table SQL interface? I see PARTITION BY on TABLE. I'm not sure how to specify the keys for Kafka.