Jstack is a tool which can be used to tell a java process to dump the current stack traces for all of its threads. It's usually included with the JDK. `kill -3 $pid` also does the same. If the output can't be respected automatically to your shell, check the stdout for the process you gave as an argument.
When your client is sitting waiting on data from the tabletserver, you can get the stack traces from the tserver and you should be able to find a thread with scan in the name, along with your client's IP, and we can help debug exactly what the server is doing that is preventing it from returning data to your client. On Oct 8, 2014 9:43 AM, "Geoffry Roberts" <threadedb...@gmail.com> wrote: > Thanks Josh. But what do you mean my "jstack'ing"? I'm unfamiliar with > that term. A better question would be how can one troubleshoot such a > thing? > > btw > I am the sole user on this cluster. > > On Tue, Oct 7, 2014 at 4:18 PM, Josh Elser <josh.el...@gmail.com> wrote: > >> Ok, this record: >> >> tcp 0 0 0.0.0.0:9997 0.0.0.0:* >> LISTEN >> >> Means that your is listening on the correct port on all interfaces. >> There shouldn't be issues connecting to the tserver. This is also >> confirmed by the fact that you authenticated and got a Connector (this >> does an RPC to the tserver). >> >> So, your tserver is up, and your client can communicate with it. The >> real question is why is the scan hanging. Perhaps jstack'ing the >> tserver when your client is blocked waiting for results. >> >> On Tue, Oct 7, 2014 at 2:07 PM, Geoffry Roberts <threadedb...@gmail.com> >> wrote: >> > "...it's when >> > you make a Connector, and your client will talk to a tabletserver to >> > authenticate, that your program should hang. It would be good to >> > verify that." >> > >> > >> > My program should hang? Would you expand? That is exactly what it is >> > doing. I am able to get a connector. But when I try to iterate the >> result >> > of a scan, that's when it hangs. >> > >> > >> > >> > >> > Here's what comes from netstat: >> > >> > >> > $ netstat -na | grep 9997 >> > >> > tcp 0 0 0.0.0.0:9997 0.0.0.0:* >> > LISTEN >> > >> > tcp 0 0 204.9.140.36:35679 204.9.140.36:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:53146 204.9.140.37:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33896 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:53282 204.9.140.37:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:53188 204.9.140.37:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:35609 204.9.140.36:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33901 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:35588 204.9.140.36:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33877 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33946 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:53167 204.9.140.37:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33949 204.9.140.38:9997 >> > ESTABLISHED >> > >> > tcp 0 0 204.9.140.36:35546 204.9.140.36:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33852 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:53125 204.9.140.37:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33922 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33747 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33961 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33793 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:35768 204.9.140.36:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33917 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33814 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:35567 204.9.140.36:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33444 204.9.140.38:9997 >> > FIN_WAIT2 >> > >> > tcp 0 0 204.9.140.36:35701 204.9.140.36:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33969 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:53258 204.9.140.37:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33831 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:53210 204.9.140.37:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:53104 204.9.140.37:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33789 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33856 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:53237 204.9.140.37:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33835 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:35651 204.9.140.36:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33938 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33041 204.9.140.36:9997 >> > ESTABLISHED >> > >> > tcp 0 0 204.9.140.36:53285 204.9.140.37:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:53305 204.9.140.37:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33768 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:35630 204.9.140.36:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33754 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:35745 204.9.140.36:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:35724 204.9.140.36:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:9997 204.9.140.36:33041 >> > ESTABLISHED >> > >> > tcp 0 0 204.9.140.36:53083 204.9.140.37:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:50623 204.9.140.37:9997 >> > ESTABLISHED >> > >> > tcp 0 0 204.9.140.36:33772 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33732 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33874 204.9.140.38:9997 >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33810 204.9.140.38:9997 >> > TIME_WAIT >> > >> > >> > On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser <josh.el...@gmail.com> >> wrote: >> >> >> >> Can you provide the output from netstat, lsof or /proc/$pid/fd for the >> >> tserver? Assuming you haven't altered tserv.port.client in >> >> accumulo-site.xml, we want the line for port 9997. >> >> >> >> From my laptop running a tserver on localhost: >> >> >> >> $ netstat -na | grep 9997 >> >> tcp4 0 0 127.0.0.1.9997 *.* >> LISTEN >> >> >> >> Depending on the tool you use, you can grep out the pid of the tserver >> >> or just that port itself. >> >> >> >> Just so you know, ZK binds to all available interfaces when it starts, >> >> so it should work seamlessly with localhost or the FQDN for the host. >> >> As such, it shouldn't matter what you provide to the >> >> ZooKeeperInstance. That should connect in all cases for you, it's when >> >> you make a Connector, and your client will talk to a tabletserver to >> >> authenticate, that your program should hang. It would be good to >> >> verify that. >> >> >> >> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts < >> threadedb...@gmail.com> >> >> wrote: >> >> > All, >> >> > >> >> > Thanks for the responses. >> >> > >> >> > Is this a problem for Accumulo? >> >> > Reverse DNS is yielding my ISP's host name. You know the drill, my >> IP in >> >> > reverse followed by their domain name, as opposed to my FQDN, which >> what >> >> > I >> >> > use in my config files. >> >> > >> >> > Running Accumulo 1.5.1 >> >> > I have only one interface. >> >> > I have the FQDN in both master and slaves files for both Hadoop and >> >> > Accumulo; in zoo.cfg; and in accumulo-site.xml where the Zookeepers >> are >> >> > referenced. >> >> > Also, I am passing in all Zk FQDN when I instantiate >> ZookeeperInstance. >> >> > Forward DNS works >> >> > Reverse DNS... well (See above). >> >> > >> >> > >> >> > >> >> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs <afu...@apache.org> >> wrote: >> >> >> >> >> >> Accumulo tservers typically listen on a single interface. If you >> have a >> >> >> server with multiple interfaces (e.g. loopback and eth0), you might >> >> >> have a >> >> >> problem in which the tablet servers are not listening on externally >> >> >> reachable interfaces. Tablet servers will list the interfaces that >> they >> >> >> are >> >> >> listening to when they boot, and you can also use tools like lsof to >> >> >> find >> >> >> them. >> >> >> >> >> >> If that is indeed the problem, then you might just need to change >> you >> >> >> conf/slaves file to use <hostname> instead of localhost, and then >> >> >> restart. >> >> >> >> >> >> Adam >> >> >> >> >> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <threadedb...@gmail.com> >> >> >> wrote: >> >> >>> >> >> >>> >> >> >>> I have been happily working with Acc, but today things changed. No >> >> >>> errors >> >> >>> >> >> >>> Until now I ran everything server side, which meant the URL was >> >> >>> localhost:2181, and life was good. Today tried running some of the >> >> >>> same >> >> >>> code as a remote client, which means <host name>:2181. Things hang >> >> >>> when >> >> >>> BatchWriter tries to commit anything and Scan hangs when it tries >> to >> >> >>> iterate >> >> >>> through a Map. >> >> >>> >> >> >>> Let's focus on the scan part: >> >> >>> >> >> >>> scan.fetchColumnFamily(new Text("colfY")); // This executes then >> >> >>> hangs. >> >> >>> for(Entry<Key,Value> entry : scan) { >> >> >>> def row = entry.getKey().getRow(); >> >> >>> def value = entry.getValue(); >> >> >>> println "value=" + value; >> >> >>> } >> >> >>> >> >> >>> This is what appears in the console : >> >> >>> >> >> >>> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got >> ping >> >> >>> response for sessionid: 0x148c6f03388005e after 21ms >> >> >>> >> >> >>> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got >> ping >> >> >>> response for sessionid: 0x148c6f03388005e after 21ms >> >> >>> >> >> >>> <and on and on> >> >> >>> >> >> >>> >> >> >>> >> >> >>> The only difference between success and a hang is a URL change, >> and of >> >> >>> course being remote. >> >> >>> >> >> >>> I don't believe this is a firewall issue. I shutdown the firewall. >> >> >>> >> >> >>> Am I missing something? >> >> >>> >> >> >>> Thanks all. >> >> >>> >> >> >>> -- >> >> >>> There are ways and there are ways, >> >> >>> >> >> >>> Geoffry Roberts >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > There are ways and there are ways, >> >> > >> >> > Geoffry Roberts >> > >> > >> > >> > >> > -- >> > There are ways and there are ways, >> > >> > Geoffry Roberts >> > > > > -- > There are ways and there are ways, > > Geoffry Roberts >