Hello all, I am using Drill to aggregate various data storage nodes like S3, Redshift, Hive and HDFS. I followed the documentation as is and it could only take me so far. I needed clarifications on a couple of things below -
- For my use case, I needed distributed drill cluster. This setup of Drill requires Zookeeper. But nowhere it is mentioned in the documentation that it should be Distributed Zookeeper, and *not Standalone. *I initially started zookeeper services in individual nodes and they were started as standalone services on localhost. After starting drillbit.sh , weirdly I could see 3 out of 5 drillbits runnign in the UI. After restarting the number went to 4 drillbits, and next time it showed only 2. This unreliable set up was confusing because I'd done everything according to the documentation. Zookeeper connections for HOST1:2181, HOST2:2181 etc is still cakked Zookeeper Quorum isn't it? Or it isn't a Quorum unless you start Zookeeper services in distributed way? Also, does Drill still run in distributed fashion without Zookeeper set up in distributed mode? - According to this line in documentation on starting drill in distributed mode Using an Ad-Hoc Connection to Drill - (you can Ctrl+F this line below to find location) *The following command starts the Drill shell in a cluster configured to run ZooKeeper on three nodes:* *bin/sqlline –u jdbc:drill:zk=cento23,zk=centos24,zk=centos26:5181* I think it should be - *bin/sqlline –u jdbc:drill:zk=cento23,centos24,centos26:5181* because it worked for me and threw an error otherwise. Let me know if you have observed this and if I am wrong with my assumptions somewhere..!! PS: Awesome job with Drill 1.4, Dev team! :-) -- Warm Regards, Rohit Kulkarni