I'm unblocked and my setup is working. Xiang Fu was very helpful. He asked for me to send details here.
1) The default resources in the create cluster command <https://apache-pinot.gitbook.io/apache-pinot-cookbook/getting-started/quickstart/aws-quickstart> are too small. EKS_CLUSTER_NAME=pinot-quickstart eksctl create cluster \ --name ${EKS_CLUSTER_NAME} \ --version 1.14 \ --region us-west-2 \ --nodegroup-name standard-workers \ --node-type t3.small \ --nodes 3 \ --nodes-min 3 \ --nodes-max 4 \ --node-ami auto I kept getting weird server deaths in different servers. The errors were not consistent. The logs weren't useful. Eventually we saw an error that indicated oom issues. To get t3.small to work, we dropped replicas to 1 and changed jvmOpts to settings like "-Xms256M -Xmx1G". 2) The incubator-pinot/kubernetes/helm/README.md <https://github.com/apache/incubator-pinot/blob/master/kubernetes/helm/README.md> was very useful. I'd recommend pointing the other docs <https://apache-pinot.gitbook.io/apache-pinot-cookbook/getting-started/quickstart/aws-quickstart> to it. 3) I made a typo when copying my table config for my local setup over to the aws setup. This didn't work and was a little tricky to debug. I had to search between a working and broken solution to see where the issue was. < "stream.kafka.broker.list": "kafka:9092", --- > "stream.kafka.broker.list": "localhost:9092", The errors could be improved to make this easier to debug: a) If the TimeoutException could include the address being requested, it would have been easier to find. b) For some reason, the code thinks there is a topic called 'metadata'. I don't know why. 2020/03/17 19:32:16.021 WARN [PartitionCountFetcher] [grizzly-http-server-1] Could not get partition count for topic events-realtime org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata 2020/03/17 19:32:16.021 ERROR [PinotTableIdealStateBuilder] [grizzly-http-server-1] Could not get partition count for events-realtime org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata 2020/03/17 19:32:16.021 ERROR [PinotTableRestletResource] [grizzly-http-server-1] org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata java.lang.RuntimeException: org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata 4) Once I got everything running, I created an AddTable and sent Kafka events to it. This didn't actually populate in Pinot. I setup a Kafka listener and I was able to listen to it. I restarted all of my Pinot servers and the problem was fixed. I don't know why. This is a flake. My top guess would be that I started Kafka after I started Pinot and that caused issues. If I see this again, I'll try to dive deeper to see if I can get logs.