So start-here.sh does it. Thanks for pointing that out. I was looking all through the shell commands .
I did try, from the master, start-all.sh and it worked for starting the tserver, but I noticed that on the master, it increased the number of processes labeled "Main" from the usual five to seven. >From accumulo-site.xml, everything memory related: <property> <name>tserver.memory.maps.max</name> <value>256M</value> </property> <property> <name>tserver.memory.maps.native.enabled</name> <value>false</value> </property> <property> <name>tserver.cache.data.size</name> <value>50M</value> </property> <property> <name>tserver.cache.index.size</name> <value>100M</value> </property> <property> <name>tserver.walog.max.size</name> <value>512M</value> </property> On Thu, Oct 9, 2014 at 10:54 AM, Josh Elser <josh.el...@gmail.com> wrote: > You can use start-here.sh on the host in question or `start-server.sh > $hostname tserver`. FWIW, re-invoking start-all should just ignored the > hosts which already have processes running and just start a tserver on the > host that died. > > 2G should be enough to get a connector and read a table. TBH, 256M should > be enough for that. > > Also, the JVM OOME doesn't include timestamps, there's isn't much more to > glean from that message other than "it died because it ran out of heap". > > What does your accumulo-site.xml look like? > > Geoffry Roberts wrote: > >> I found the message in tserver*.out. tserver*.err has 0 in it. >> >> I posted last night, life was good, sat down this morning and saw that >> another tserver had crashed, over night, with no activity. ?? In >> tserver*.out it again says out of heap space. >> >> ACCUMULO_TSERVER_OPTS=-Xmx2G -Xms1G. I would have thought it sufficient. >> >> The fact that the log entries lack timestamps, but have hashmarks makes >> makes me wonder if I am reading things correctly. >> >> # >> >> # java.lang.OutOfMemoryError: Java heap space >> >> # -XX:OnOutOfMemoryError="kill -9 %p" >> >> # Executing /bin/sh -c "kill -9 3241"... >> >> >> Is there a way to start a particular tablet server? >> >> >> On Wed, Oct 8, 2014 at 6:55 PM, Eric Newton <eric.new...@gmail.com >> <mailto:eric.new...@gmail.com>> wrote: >> >> Did you find the message in the tserver*.out, terver*.err or the >> monitor page? >> >> (Thanks for the follow-up message.) >> >> On Wed, Oct 8, 2014 at 6:39 PM, Geoffry Roberts >> <threadedb...@gmail.com <mailto:threadedb...@gmail.com>> wrote: >> >> Just for the record, I finally got to the bottom of things. One >> of my Tservers was running out of memory. I hadn't noticed. I >> had my SA allocate a lttle more--each node now has 6G up from >> 2G--and things are working better. >> >> On Oct 8, 2014 10:09 AM, "Josh Elser" <josh.el...@gmail.com >> <mailto:josh.el...@gmail.com>> wrote: >> >> Jstack is a tool which can be used to tell a java process to >> dump the current stack traces for all of its threads. It's >> usually included with the JDK. `kill -3 $pid` also does the >> same. If the output can't be respected automatically to your >> shell, check the stdout for the process you gave as an >> argument. >> >> When your client is sitting waiting on data from the >> tabletserver, you can get the stack traces from the tserver >> and you should be able to find a thread with scan in the >> name, along with your client's IP, and we can help debug >> exactly what the server is doing that is preventing it from >> returning data to your client. >> >> On Oct 8, 2014 9:43 AM, "Geoffry Roberts" >> <threadedb...@gmail.com <mailto:threadedb...@gmail.com>> >> wrote: >> >> Thanks Josh. But what do you mean my "jstack'ing"? I'm >> unfamiliar with that term. A better question would be >> how can one troubleshoot such a thing? >> >> btw >> I am the sole user on this cluster. >> >> On Tue, Oct 7, 2014 at 4:18 PM, Josh Elser >> <josh.el...@gmail.com <mailto:josh.el...@gmail.com>> >> wrote: >> >> Ok, this record: >> >> tcp 0 0 0.0.0.0:9997 >> <http://0.0.0.0:9997> 0.0.0.0:* >> LISTEN >> >> Means that your is listening on the correct port on >> all interfaces. >> There shouldn't be issues connecting to the tserver. >> This is also >> confirmed by the fact that you authenticated and got >> a Connector (this >> does an RPC to the tserver). >> >> So, your tserver is up, and your client can >> communicate with it. The >> real question is why is the scan hanging. Perhaps >> jstack'ing the >> tserver when your client is blocked waiting for >> results. >> >> On Tue, Oct 7, 2014 at 2:07 PM, Geoffry Roberts >> <threadedb...@gmail.com >> <mailto:threadedb...@gmail.com>> wrote: >> > "...it's when >> > you make a Connector, and your client will talk >> to a tabletserver to >> > authenticate, that your program should hang. It >> would be good to >> > verify that." >> > >> > >> > My program should hang? Would you expand? That >> is exactly what it is >> > doing. I am able to get a connector. But when I >> try to iterate the result >> > of a scan, that's when it hangs. >> > >> > >> > >> > >> > Here's what comes from netstat: >> > >> > >> > $ netstat -na | grep 9997 >> > >> > tcp 0 0 0.0.0.0:9997 >> <http://0.0.0.0:9997> 0.0.0.0:* >> > LISTEN >> > >> > tcp 0 0 204.9.140.36:35679 >> <http://204.9.140.36:35679> 204.9.140.36:9997 >> <http://204.9.140.36:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:53146 >> <http://204.9.140.36:53146> 204.9.140.37:9997 >> <http://204.9.140.37:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33896 >> <http://204.9.140.36:33896> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:53282 >> <http://204.9.140.36:53282> 204.9.140.37:9997 >> <http://204.9.140.37:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:53188 >> <http://204.9.140.36:53188> 204.9.140.37:9997 >> <http://204.9.140.37:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:35609 >> <http://204.9.140.36:35609> 204.9.140.36:9997 >> <http://204.9.140.36:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33901 >> <http://204.9.140.36:33901> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:35588 >> <http://204.9.140.36:35588> 204.9.140.36:9997 >> <http://204.9.140.36:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33877 >> <http://204.9.140.36:33877> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33946 >> <http://204.9.140.36:33946> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:53167 >> <http://204.9.140.36:53167> 204.9.140.37:9997 >> <http://204.9.140.37:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33949 >> <http://204.9.140.36:33949> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > ESTABLISHED >> > >> > tcp 0 0 204.9.140.36:35546 >> <http://204.9.140.36:35546> 204.9.140.36:9997 >> <http://204.9.140.36:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33852 >> <http://204.9.140.36:33852> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:53125 >> <http://204.9.140.36:53125> 204.9.140.37:9997 >> <http://204.9.140.37:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33922 >> <http://204.9.140.36:33922> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33747 >> <http://204.9.140.36:33747> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33961 >> <http://204.9.140.36:33961> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33793 >> <http://204.9.140.36:33793> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:35768 >> <http://204.9.140.36:35768> 204.9.140.36:9997 >> <http://204.9.140.36:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33917 >> <http://204.9.140.36:33917> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33814 >> <http://204.9.140.36:33814> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:35567 >> <http://204.9.140.36:35567> 204.9.140.36:9997 >> <http://204.9.140.36:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33444 >> <http://204.9.140.36:33444> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > FIN_WAIT2 >> > >> > tcp 0 0 204.9.140.36:35701 >> <http://204.9.140.36:35701> 204.9.140.36:9997 >> <http://204.9.140.36:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33969 >> <http://204.9.140.36:33969> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:53258 >> <http://204.9.140.36:53258> 204.9.140.37:9997 >> <http://204.9.140.37:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33831 >> <http://204.9.140.36:33831> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:53210 >> <http://204.9.140.36:53210> 204.9.140.37:9997 >> <http://204.9.140.37:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:53104 >> <http://204.9.140.36:53104> 204.9.140.37:9997 >> <http://204.9.140.37:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33789 >> <http://204.9.140.36:33789> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33856 >> <http://204.9.140.36:33856> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:53237 >> <http://204.9.140.36:53237> 204.9.140.37:9997 >> <http://204.9.140.37:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33835 >> <http://204.9.140.36:33835> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:35651 >> <http://204.9.140.36:35651> 204.9.140.36:9997 >> <http://204.9.140.36:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33938 >> <http://204.9.140.36:33938> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33041 >> <http://204.9.140.36:33041> 204.9.140.36:9997 >> <http://204.9.140.36:9997> >> > ESTABLISHED >> > >> > tcp 0 0 204.9.140.36:53285 >> <http://204.9.140.36:53285> 204.9.140.37:9997 >> <http://204.9.140.37:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:53305 >> <http://204.9.140.36:53305> 204.9.140.37:9997 >> <http://204.9.140.37:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33768 >> <http://204.9.140.36:33768> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:35630 >> <http://204.9.140.36:35630> 204.9.140.36:9997 >> <http://204.9.140.36:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33754 >> <http://204.9.140.36:33754> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:35745 >> <http://204.9.140.36:35745> 204.9.140.36:9997 >> <http://204.9.140.36:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:35724 >> <http://204.9.140.36:35724> 204.9.140.36:9997 >> <http://204.9.140.36:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:9997 >> <http://204.9.140.36:9997> 204.9.140.36:33041 >> <http://204.9.140.36:33041> >> > ESTABLISHED >> > >> > tcp 0 0 204.9.140.36:53083 >> <http://204.9.140.36:53083> 204.9.140.37:9997 >> <http://204.9.140.37:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:50623 >> <http://204.9.140.36:50623> 204.9.140.37:9997 >> <http://204.9.140.37:9997> >> > ESTABLISHED >> > >> > tcp 0 0 204.9.140.36:33772 >> <http://204.9.140.36:33772> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33732 >> <http://204.9.140.36:33732> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33874 >> <http://204.9.140.36:33874> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > tcp 0 0 204.9.140.36:33810 >> <http://204.9.140.36:33810> 204.9.140.38:9997 >> <http://204.9.140.38:9997> >> > TIME_WAIT >> > >> > >> > On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser >> <josh.el...@gmail.com <mailto:josh.el...@gmail.com>> >> >> wrote: >> >> >> >> Can you provide the output from netstat, lsof or >> /proc/$pid/fd for the >> >> tserver? Assuming you haven't altered >> tserv.port.client in >> >> accumulo-site.xml, we want the line for port 9997. >> >> >> >> From my laptop running a tserver on localhost: >> >> >> >> $ netstat -na | grep 9997 >> >> tcp4 0 0 127.0.0.1.9997 *.* >> LISTEN >> >> >> >> Depending on the tool you use, you can grep out >> the pid of the tserver >> >> or just that port itself. >> >> >> >> Just so you know, ZK binds to all available >> interfaces when it starts, >> >> so it should work seamlessly with localhost or >> the FQDN for the host. >> >> As such, it shouldn't matter what you provide to >> the >> >> ZooKeeperInstance. That should connect in all >> cases for you, it's when >> >> you make a Connector, and your client will talk >> to a tabletserver to >> >> authenticate, that your program should hang. It >> would be good to >> >> verify that. >> >> >> >> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts >> <threadedb...@gmail.com <mailto: >> threadedb...@gmail.com>> >> >> wrote: >> >> > All, >> >> > >> >> > Thanks for the responses. >> >> > >> >> > Is this a problem for Accumulo? >> >> > Reverse DNS is yielding my ISP's host name. >> You know the drill, my IP in >> >> > reverse followed by their domain name, as >> opposed to my FQDN, which what >> >> > I >> >> > use in my config files. >> >> > >> >> > Running Accumulo 1.5.1 >> >> > I have only one interface. >> >> > I have the FQDN in both master and slaves >> files for both Hadoop and >> >> > Accumulo; in zoo.cfg; and in accumulo-site.xml >> where the Zookeepers are >> >> > referenced. >> >> > Also, I am passing in all Zk FQDN when I >> instantiate ZookeeperInstance. >> >> > Forward DNS works >> >> > Reverse DNS... well (See above). >> >> > >> >> > >> >> > >> >> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs >> <afu...@apache.org <mailto:afu...@apache.org>> wrote: >> >> >> >> >> >> Accumulo tservers typically listen on a >> single interface. If you have a >> >> >> server with multiple interfaces (e.g. >> loopback and eth0), you might >> >> >> have a >> >> >> problem in which the tablet servers are not >> listening on externally >> >> >> reachable interfaces. Tablet servers will >> list the interfaces that they >> >> >> are >> >> >> listening to when they boot, and you can also >> use tools like lsof to >> >> >> find >> >> >> them. >> >> >> >> >> >> If that is indeed the problem, then you might >> just need to change you >> >> >> conf/slaves file to use <hostname> instead of >> localhost, and then >> >> >> restart. >> >> >> >> >> >> Adam >> >> >> >> >> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" >> <threadedb...@gmail.com <mailto: >> threadedb...@gmail.com>> >> >> >> >> wrote: >> >> >>> >> >> >>> >> >> >>> I have been happily working with Acc, but >> today things changed. No >> >> >>> errors >> >> >>> >> >> >>> Until now I ran everything server side, >> which meant the URL was >> >> >>> localhost:2181, and life was good. Today >> tried running some of the >> >> >>> same >> >> >>> code as a remote client, which means <host >> name>:2181. Things hang >> >> >>> when >> >> >>> BatchWriter tries to commit anything and >> Scan hangs when it tries to >> >> >>> iterate >> >> >>> through a Map. >> >> >>> >> >> >>> Let's focus on the scan part: >> >> >>> >> >> >>> scan.fetchColumnFamily(new Text("colfY")); >> // This executes then >> >> >>> hangs. >> >> >>> for(Entry<Key,Value> entry : scan) { >> >> >>> def row = entry.getKey().getRow(); >> >> >>> def value = entry.getValue(); >> >> >>> println "value=" + value; >> >> >>> } >> >> >>> >> >> >>> This is what appears in the console : >> >> >>> >> >> >>> 17:22:39.802 C{0} M DEBUG >> org.apache.zookeeper.ClientCnxn - Got ping >> >> >>> response for sessionid: 0x148c6f03388005e >> after 21ms >> >> >>> >> >> >>> 17:22:49.803 C{0} M DEBUG >> org.apache.zookeeper.ClientCnxn - Got ping >> >> >>> response for sessionid: 0x148c6f03388005e >> after 21ms >> >> >>> >> >> >>> <and on and on> >> >> >>> >> >> >>> >> >> >>> >> >> >>> The only difference between success and a >> hang is a URL change, and of >> >> >>> course being remote. >> >> >>> >> >> >>> I don't believe this is a firewall issue. I >> shutdown the firewall. >> >> >>> >> >> >>> Am I missing something? >> >> >>> >> >> >>> Thanks all. >> >> >>> >> >> >>> -- >> >> >>> There are ways and there are ways, >> >> >>> >> >> >>> Geoffry Roberts >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > There are ways and there are ways, >> >> > >> >> > Geoffry Roberts >> > >> > >> > >> > >> > -- >> > There are ways and there are ways, >> > >> > Geoffry Roberts >> >> >> >> >> -- >> There are ways and there are ways, >> >> Geoffry Roberts >> >> >> >> >> >> -- >> There are ways and there are ways, >> >> Geoffry Roberts >> > -- There are ways and there are ways, Geoffry Roberts