Re: scaling drill in an openshift (K8s) cluster

Paul Rogers Tue, 24 Mar 2020 11:52:30 -0700

Hi All,

The issue of connecting to a pod from outside the K8s cluster is a known, 
intentional limitation of K8s. K8s creates is own overlay network for pod 
addresses (at least in the plain-vanilla version.) Amazon EKS seems to draw pod 
IPs from the same pool as VMs, and so, on AWS, pods may be reachable.


Dobes makes a good point about stateful sets. However, in normal operation, the 
Drillbit IPs should not matter: it is ZK which is critical. Each Drillbit needs 
to know the ZK addresses and will register itself with ZK. Clients consult ZK 
to find Drilbits. So, the Drillbit IPs themselves can change on each Drillbit 
run.

This does mean that ZK has to be visible outside the K8s overlay network. And, 
to connect to the Drillbit, each Drillbit IP must also be visible (but not 
known ahead of time to the client, only the ZK addresses must be known to the 
client ahead of time.)


The general solution is to put a load balancer or other gateway in front of 
each ingress point. In a production environment, each ingress tends to be 
secured with that firm's SSO solution. All of this is more K8s magic than a 
Drill issue.

One quick solution is to run a K8s proxy to forward the Drillbit web address to 
outside nodes. Won't help for the JDBC driver, but let's you manage the Drill 
server via REST.

Abhishek has been working on a K8s solution. If he is reading this, perhaps he 
can offer some advice of what worked for him.


Thanks,
- Paul

 

    On Tuesday, March 24, 2020, 9:04:35 AM PDT, Dobes Vandermeer 
<[email protected]> wrote:  
 
 I was able to get drill up and running inside a k8s cluster but I didn't 
connect to it from outside the cluster, so the DNS names were always resolvable 
by the client(s).

I had to run it as a statefulset to ensure the DNS names are stable, otherwise 
the drillbits couldn't talk to each other, either.

On 3/24/2020 6:37:44 AM, Jaimes, Rafael - 0993 - MITLL 
<[email protected]> wrote:
I’m seeing a problem with scaling the number of pod instances in the 
replication controller because they aren’t reporting their hostnames properly. 
This was a common problem that got fixed in scalable architectures like 
ZooKeeper and Kafka (see reference at bottom I think this was related).
 
In Drill’s case, ZooKeeper is able to see all of the drillbits, however, the 
hostnames are only locally addressable within the cluster, so as soon as you 
perform a query it fails since the client can’t find the drillbit that it got 
assigned, its hostname isn’t externally addressable.
 
Kafka fixes this by allowing an override for advertised names. Has anyone 
gotten Drill to scale in a K8s cluster?
 
https://issues.apache.org/jira/browse/KAFKA-1070

Re: scaling drill in an openshift (K8s) cluster

Reply via email to