"...it's when you make a Connector, and your client will talk to a tabletserver to authenticate, that your program should hang. It would be good to verify that."
My program should hang? Would you expand? That is exactly what it is doing. I am able to get a connector. But when I try to iterate the result of a scan, that's when it hangs. Here's what comes from netstat: $ netstat -na | grep 9997 tcp 0 0 0.0.0.0:9997 0.0.0.0:* LISTEN tcp 0 0 204.9.140.36:35679 204.9.140.36:9997 TIME_WAIT tcp 0 0 204.9.140.36:53146 204.9.140.37:9997 TIME_WAIT tcp 0 0 204.9.140.36:33896 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:53282 204.9.140.37:9997 TIME_WAIT tcp 0 0 204.9.140.36:53188 204.9.140.37:9997 TIME_WAIT tcp 0 0 204.9.140.36:35609 204.9.140.36:9997 TIME_WAIT tcp 0 0 204.9.140.36:33901 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:35588 204.9.140.36:9997 TIME_WAIT tcp 0 0 204.9.140.36:33877 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:33946 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:53167 204.9.140.37:9997 TIME_WAIT tcp 0 0 204.9.140.36:33949 204.9.140.38:9997 ESTABLISHED tcp 0 0 204.9.140.36:35546 204.9.140.36:9997 TIME_WAIT tcp 0 0 204.9.140.36:33852 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:53125 204.9.140.37:9997 TIME_WAIT tcp 0 0 204.9.140.36:33922 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:33747 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:33961 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:33793 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:35768 204.9.140.36:9997 TIME_WAIT tcp 0 0 204.9.140.36:33917 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:33814 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:35567 204.9.140.36:9997 TIME_WAIT tcp 0 0 204.9.140.36:33444 204.9.140.38:9997 FIN_WAIT2 tcp 0 0 204.9.140.36:35701 204.9.140.36:9997 TIME_WAIT tcp 0 0 204.9.140.36:33969 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:53258 204.9.140.37:9997 TIME_WAIT tcp 0 0 204.9.140.36:33831 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:53210 204.9.140.37:9997 TIME_WAIT tcp 0 0 204.9.140.36:53104 204.9.140.37:9997 TIME_WAIT tcp 0 0 204.9.140.36:33789 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:33856 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:53237 204.9.140.37:9997 TIME_WAIT tcp 0 0 204.9.140.36:33835 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:35651 204.9.140.36:9997 TIME_WAIT tcp 0 0 204.9.140.36:33938 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:33041 204.9.140.36:9997 ESTABLISHED tcp 0 0 204.9.140.36:53285 204.9.140.37:9997 TIME_WAIT tcp 0 0 204.9.140.36:53305 204.9.140.37:9997 TIME_WAIT tcp 0 0 204.9.140.36:33768 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:35630 204.9.140.36:9997 TIME_WAIT tcp 0 0 204.9.140.36:33754 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:35745 204.9.140.36:9997 TIME_WAIT tcp 0 0 204.9.140.36:35724 204.9.140.36:9997 TIME_WAIT tcp 0 0 204.9.140.36:9997 204.9.140.36:33041 ESTABLISHED tcp 0 0 204.9.140.36:53083 204.9.140.37:9997 TIME_WAIT tcp 0 0 204.9.140.36:50623 204.9.140.37:9997 ESTABLISHED tcp 0 0 204.9.140.36:33772 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:33732 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:33874 204.9.140.38:9997 TIME_WAIT tcp 0 0 204.9.140.36:33810 204.9.140.38:9997 TIME_WAIT On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser <josh.el...@gmail.com> wrote: > Can you provide the output from netstat, lsof or /proc/$pid/fd for the > tserver? Assuming you haven't altered tserv.port.client in > accumulo-site.xml, we want the line for port 9997. > > From my laptop running a tserver on localhost: > > $ netstat -na | grep 9997 > tcp4 0 0 127.0.0.1.9997 *.* LISTEN > > Depending on the tool you use, you can grep out the pid of the tserver > or just that port itself. > > Just so you know, ZK binds to all available interfaces when it starts, > so it should work seamlessly with localhost or the FQDN for the host. > As such, it shouldn't matter what you provide to the > ZooKeeperInstance. That should connect in all cases for you, it's when > you make a Connector, and your client will talk to a tabletserver to > authenticate, that your program should hang. It would be good to > verify that. > > On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts <threadedb...@gmail.com> > wrote: > > All, > > > > Thanks for the responses. > > > > Is this a problem for Accumulo? > > Reverse DNS is yielding my ISP's host name. You know the drill, my IP in > > reverse followed by their domain name, as opposed to my FQDN, which what > I > > use in my config files. > > > > Running Accumulo 1.5.1 > > I have only one interface. > > I have the FQDN in both master and slaves files for both Hadoop and > > Accumulo; in zoo.cfg; and in accumulo-site.xml where the Zookeepers are > > referenced. > > Also, I am passing in all Zk FQDN when I instantiate ZookeeperInstance. > > Forward DNS works > > Reverse DNS... well (See above). > > > > > > > > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs <afu...@apache.org> wrote: > >> > >> Accumulo tservers typically listen on a single interface. If you have a > >> server with multiple interfaces (e.g. loopback and eth0), you might > have a > >> problem in which the tablet servers are not listening on externally > >> reachable interfaces. Tablet servers will list the interfaces that they > are > >> listening to when they boot, and you can also use tools like lsof to > find > >> them. > >> > >> If that is indeed the problem, then you might just need to change you > >> conf/slaves file to use <hostname> instead of localhost, and then > restart. > >> > >> Adam > >> > >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <threadedb...@gmail.com> > wrote: > >>> > >>> > >>> I have been happily working with Acc, but today things changed. No > >>> errors > >>> > >>> Until now I ran everything server side, which meant the URL was > >>> localhost:2181, and life was good. Today tried running some of the > same > >>> code as a remote client, which means <host name>:2181. Things hang > when > >>> BatchWriter tries to commit anything and Scan hangs when it tries to > iterate > >>> through a Map. > >>> > >>> Let's focus on the scan part: > >>> > >>> scan.fetchColumnFamily(new Text("colfY")); // This executes then hangs. > >>> for(Entry<Key,Value> entry : scan) { > >>> def row = entry.getKey().getRow(); > >>> def value = entry.getValue(); > >>> println "value=" + value; > >>> } > >>> > >>> This is what appears in the console : > >>> > >>> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping > >>> response for sessionid: 0x148c6f03388005e after 21ms > >>> > >>> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping > >>> response for sessionid: 0x148c6f03388005e after 21ms > >>> > >>> <and on and on> > >>> > >>> > >>> > >>> The only difference between success and a hang is a URL change, and of > >>> course being remote. > >>> > >>> I don't believe this is a firewall issue. I shutdown the firewall. > >>> > >>> Am I missing something? > >>> > >>> Thanks all. > >>> > >>> -- > >>> There are ways and there are ways, > >>> > >>> Geoffry Roberts > > > > > > > > > > -- > > There are ways and there are ways, > > > > Geoffry Roberts > -- There are ways and there are ways, Geoffry Roberts