I'm unblocked and my setup is working.  Xiang Fu was very helpful.  He
asked for me to send details here.

1) The default resources in the create cluster command
<https://apache-pinot.gitbook.io/apache-pinot-cookbook/getting-started/quickstart/aws-quickstart>
are too small.

EKS_CLUSTER_NAME=pinot-quickstart
eksctl create cluster \
--name ${EKS_CLUSTER_NAME} \
--version 1.14 \
--region us-west-2 \
--nodegroup-name standard-workers \
--node-type t3.small \
--nodes 3 \
--nodes-min 3 \
--nodes-max 4 \
--node-ami auto

I kept getting weird server deaths in different servers.  The errors were
not consistent.  The logs weren't useful.  Eventually we saw an error that
indicated oom issues.

To get t3.small to work, we dropped replicas to 1 and changed jvmOpts to
settings like "-Xms256M -Xmx1G".

2) The incubator-pinot/kubernetes/helm/README.md
<https://github.com/apache/incubator-pinot/blob/master/kubernetes/helm/README.md>
was very useful.  I'd recommend pointing the other docs
<https://apache-pinot.gitbook.io/apache-pinot-cookbook/getting-started/quickstart/aws-quickstart>
to
it.

3) I made a typo when copying my table config for my local setup over to
the aws setup.  This didn't work and was a little tricky to debug.  I had
to search between a working and broken solution to see where the issue was.

<           "stream.kafka.broker.list": "kafka:9092",
---
>           "stream.kafka.broker.list": "localhost:9092",

The errors could be improved to make this easier to debug:
a) If the TimeoutException could include the address being requested, it
would have been easier to find.
b) For some reason, the code thinks there is a topic called 'metadata'.  I
don't know why.

2020/03/17 19:32:16.021 WARN [PartitionCountFetcher]
[grizzly-http-server-1] Could not get partition count for topic
events-realtime
org.apache.kafka.common.errors.TimeoutException: Timeout expired while
fetching topic metadata
2020/03/17 19:32:16.021 ERROR [PinotTableIdealStateBuilder]
[grizzly-http-server-1] Could not get partition count for events-realtime
org.apache.kafka.common.errors.TimeoutException: Timeout expired while
fetching topic metadata
2020/03/17 19:32:16.021 ERROR [PinotTableRestletResource]
[grizzly-http-server-1] org.apache.kafka.common.errors.TimeoutException:
Timeout expired while fetching topic metadata
java.lang.RuntimeException:
org.apache.kafka.common.errors.TimeoutException: Timeout expired while
fetching topic metadata

4) Once I got everything running, I created an AddTable and sent Kafka
events to it.  This didn't actually populate in Pinot.

I setup a Kafka listener and I was able to listen to it.  I restarted all
of my Pinot servers and the problem was fixed.  I don't know why.  This is
a flake.  My top guess would be that I started Kafka after I started Pinot
and that caused issues.  If I see this again, I'll try to dive deeper to
see if I can get logs.

Reply via email to