Hello all,

I am using Drill to aggregate various data storage nodes like S3, Redshift,
Hive and HDFS. I followed the documentation as is and it could only take me
so far. I needed clarifications on a couple of things below -


   - For my use case, I needed distributed drill cluster. This setup of
   Drill requires Zookeeper. But nowhere it is mentioned in the documentation
   that it should be Distributed Zookeeper, and *not Standalone. *I
   initially started zookeeper services in individual nodes and they were
   started as standalone services on localhost. After starting drillbit.sh ,
   weirdly I could see 3 out of 5 drillbits runnign in the UI. After
   restarting the number went to 4 drillbits, and next time it showed only 2.
   This unreliable set up was confusing because I'd done everything according
   to the documentation. Zookeeper connections for HOST1:2181, HOST2:2181 etc
   is still cakked Zookeeper Quorum isn't it? Or it isn't a Quorum unless you
   start Zookeeper services in distributed way?
   Also, does Drill still run in distributed fashion without Zookeeper set
   up in distributed mode?


   - According to this line in documentation on starting drill in
   distributed mode Using an Ad-Hoc Connection to Drill - (you can Ctrl+F this
   line below to find location)

   *The following command starts the Drill shell in a cluster configured to
   run ZooKeeper on three nodes:*

   *bin/sqlline –u jdbc:drill:zk=cento23,zk=centos24,zk=centos26:5181*

   I think it should be -
   *bin/sqlline –u jdbc:drill:zk=cento23,centos24,centos26:5181*

   because it worked for me and threw an error otherwise.


​Let me know if you have observed this and if I am wrong with my
assumptions somewhere..!!
​
PS: Awesome job with Drill 1.4, Dev team! :-)

-- 
Warm Regards,
Rohit Kulkarni

Reply via email to