Thank you all for your suggestions! I appreciate the fast turnaround. My setup is using Amazon ECS for our solr cloud installation. Each ZK is in its own container, using Route53 Service Discovery to provide the DNS name. The ZK nodes can all talk to each other, and I can communicate to each one of those nodes from my local machine and from within the solr container. Solr is one node per container, as Martijn correctly assumed. I am not using a zkRoot at present because my intention is to use ZK solely for Solr Cloud and nothing else.
I have tried removing the "-z" option from the Dockerfile CMD and using the ZK_HOST environment variable (see below). I have even also modified the solr.in.sh and set the ZK_HOST variable there, all to no avail. I have tried both the Dockerfile command route, and have logged into the solr container and tried to run the CMD manually to see if there was a problem with the way I was using the CMD entry. All of those methods give me the same result output captured in the gist below. The gist for my solr.log output is here: https://gist.github.com/dkidder/2db9a6d393dedb97a39ed32e2be0c087 My Dockerfile for the solr container looks like this: FROM solr:8.2 EXPOSE 8983 8999 2181 VOLUME /app/logs VOLUME /app/data VOLUME /app/conf ## add our jetty configuration (increased request size!) COPY jetty.xml /opt/solr/server/etc/ ## SolrCloud configuration ENV ZK_HOST zk1:2181,zk2:2181,zk3:2181 ENV ZK_CLIENT_TIMEOUT 30000 USER root RUN apt-get update RUN apt-get install -y netcat net-tools vim procps USER solr # Copy over custom solr plugins COPY myplugins/src/resources/* /opt/solr/server/solr/my-resources/ COPY lib/*.jar /opt/solr/my-lib/ # Copy over my configs COPY conf/ /app/conf #Start solr in cloud mode, connecting to zookeeper CMD ["solr","start","-f","-c"] The docker command I use to execute this Dockerfile is `docker run -p 8983:8983 -p 2181:2181 --name $(APP_NAME) $(APP_NAME):latest` Output of `ps -eflww` from within the solr container (as root): root@fe0ad5b40b42:/opt/solr-8.2.0# ps -eflww F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 4 S solr 1 0 9 80 0 - 1043842 - 14:36 ? 00:00:07 /usr/local/openjdk-11/bin/java -server -Xms512m -Xmx512m -XX:+UseG1GC -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=250 -XX:+UseLargePages -XX:+AlwaysPreTouch -Xlog:gc*:file=/var/solr/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=18983 -Dcom.sun.management.jmxremote.rmi.port=18983 -DzkClientTimeout=30000 -DzkHost=zk1:2181,zk2:2181,zk3:2181 -Dsolr.log.dir=/var/solr/logs -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/opt/solr/server -Dsolr.solr.home=/var/solr/data -Dsolr.data.home= -Dsolr.install.dir=/opt/solr -Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf -Dlog4j.configurationFile=file:/var/solr/log4j2.xml -Xss256k -Dsolr.jetty.https.port=8983 -jar start.jar --module=http 4 S root 90 0 0 80 0 - 4988 - 14:37 pts/0 00:00:00 /bin/bash 0 R root 95 90 0 80 0 - 9595 - 14:37 pts/0 00:00:00 ps -eflww Output of netstat from within the solr container (as root): root@fe0ad5b40b42:/opt/solr-8.2.0# netstat Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 fe0ad5b40b42:43678 172.20.28.179:2181 TIME_WAIT tcp 0 0 fe0ad5b40b42:60164 172.20.155.241:2181 TIME_WAIT tcp 0 0 fe0ad5b40b42:60500 172.20.60.138:2181 TIME_WAIT Active UNIX domain sockets (w/o servers) Proto RefCnt Flags Type State I-Node Path unix 2 [ ] STREAM CONNECTED 129252 unix 2 [ ] STREAM CONNECTED 129270 I'm beginning to think that ZK is not setup correctly. I haven't uploaded any configuration files to ZK yet; my understanding was that I could start up a solr cloud node with no collections and upload the configuration from there. I was under the impression that it would try to connect to ZK and if it couldn't get config files from there it would use local config files. Do I need to upload the solr cloud configuration files to ZK before starting up the cluster? The netstat output makes it look like the solr container is indeed connected to the ZK containers, but there's no indication as to why it cannot connect to Zookeeper that I can see. -- Drew(i...@gmail.com) http://wyntermute.dyndns.org/blog/ -- I Drive Way Too Fast To Worry About Cholesterol. On Fri, Oct 18, 2019 at 3:11 AM Martijn Koster <mak-luc...@greenhills.co.uk> wrote: > > > > On 18 Oct 2019, at 00:25, Drew Kidder <dre...@gmail.com> wrote: > > > * I'm using the following command line to start a basic solr cloud > instance > > as per the documentation: `bin/solr start -c -z > zk1:2181,zk2:2181,zk3:2181` > > I assume you’re just looking to run a single Solr node in a single > container, right? > > Just set the ZK_HOST environment variable, and remove the command-line > arguments. > And you don’t need to specify the port number unless you deviate from the > default. > Have a look at this example > https://github.com/docker-solr/docker-solr-examples/blob/master/swarm/docker-compose.yml > < > https://github.com/docker-solr/docker-solr-examples/blob/master/swarm/docker-compose.yml#L61with > > > > The “start” command starts Solr in the background, which is typically not > what you want > when running Solr under docker. > > > Why your command isn’t working as is, is not clear. When you say you’re > using that > command-line, how do you actually do that? In a full docker command line, > or a compose file, or from a “docker exec”, or from some orchestrator. > Share the exact thing you’re doing; perhaps there is mistake there. > Also, run `ps -eflww` in the container to see what command-line arguments > the JVM actually got started with. > And share the full startup log somewhere (in a GitHub gist perhaps), there > might be something of interest earlier on. > > >> (running `echo ruok | nc zk1 2181` returns the expected "imok" response > >> from ZK within the docker container where Solr is located) > >> * The netcat command mentioned above shows up in the ZK logs, but the > Solr > >> attempts to connect do not (it's like the request isn't even getting to > ZK) > > Then it doesn’t sound like a environmental firewall/security-group/routing > issue. > Next step to debug then could be to check if you actually see Solr make > tcp connections > to port 2181, in the Solr container, using tcpdump/sysdig/netstat or some > such. > If that gives a negative result, then you know it’s an issue in your Solr > invocation config, or name resolution. > If that gives a positive result, then it’s environmental after all; and > you can dig further. > > > But try the ZK_HOST thing first; it may just fix it. > > — Martijn