Ok, so now that I have an Accumulo monitor I discovered that my Accumulo instance doesn't have any tablet servers.
Here is what I tried so far to resolve the issue: 1) Looked in the tserver_localhost.localdomain.log file, and found this FATAL message: 2013-09-12 08:09:42,273 [tabletserver.TabletServer] FATAL: Must set dfs.durable.sync OR dfs.support.append to true. Which one needs to be set depends on your version of HDFS. See ACCUMULO-623. HADOOP RELEASE VERSION SYNC NAME DEFAULT Apache Hadoop 0.20.205 dfs.support.append false Apache Hadoop 0.23.x dfs.support.append true Apache Hadoop 1.0.x dfs.support.append false Apache Hadoop 1.1.x dfs.durable.sync true Apache Hadoop 2.0.0-2.0.2 dfs.support.append true Cloudera CDH 3u0-3u3 ???? true Cloudera CDH 3u4 dfs.support.append true Hortonworks HDP `1.0 dfs.support.append false Hortonworks HDP `1.1 dfs.support.append false 2013-09-12 11:54:00,752 [server.Accumulo] INFO : tserver starting 2013-09-12 11:54:00,768 [server.Accumulo] INFO : Instance d57cdc38-8ceb-4192-9da3-1ce2664df33b 2013-09-12 11:54:00,771 [server.Accumulo] INFO : Data Version 5 2013-09-12 11:54:00,771 [server.Accumulo] INFO : Attempting to talk to zookeeper 2013-09-12 11:54:00,952 [server.Accumulo] INFO : Zookeeper connected and initialized, attemping to talk to HDFS 2013-09-12 11:54:00,956 [server.Accumulo] INFO : Connected to HDFS 2013-09-12 11:54:00,969 [server.Accumulo] INFO : gc.cycle.delay = 5m 2013-09-12 11:54:00,969 [server.Accumulo] INFO : gc.cycle.start = 30s 2013-09-12 11:54:00,969 [server.Accumulo] INFO : gc.port.client = 50091 2013-09-12 11:54:00,969 [server.Accumulo] INFO : gc.threads.delete = 16 2013-09-12 11:54:00,969 [server.Accumulo] INFO : gc.trash.ignore = false I saw this same FATAL message 8 times in the tserver_localhost.localdomain.log between blocks of INFO messages, but no other fatal or warn messages. Btw, this FATAL message also appears in my tserver_localhost.localdomain.debug.log file. When I googled this Fatal message I found this page: http://mail-archives.apache.org/mod_mbox/accumulo-user/201304.mbox/%[email protected]%3E with the same "WARN: There are no tablet servers: check that zookeeper and accumulo are running." message. I checked http://127.0.0.1:50095/tservers, and it showed that there were no tablet servers online. I looked at http://127.0.0.1:50095/log, and saw the following messages: FATAL: Must set dfs.durable.sync or dfs.support.append to true. Which one needs to be set depends on your version of HDFS. See Accumulo-623. WARN: There are no tablet servers: check that zookeeper and accumulo are running. Using the info from the page I referenced above, I checked my $ACCUMULO_HOME path and realized that I hadn't set that in the conf/accumulo-env.sh So, I set it to the following: test -z "$ACCUMULO_HOME" && export ACCUMULO_HOME=/home/accumulo/accumulo-1.5.0 When I did an echo of $ACCUMULO_HOME it didn't return anything, so I also tried setting it in my bash profile to see if that made any difference (it didn't). I also looked in the lib directory but didn't see any stray jars. In my tracer_localhost_localdomain.log I saw the following Exception with Zookeeper: 2013-09-11 16:09:48,649 [impl.ServerClient] WARN : There are no tablet servers: check that zookeeper and accumulo are running. 2013-09-11 18:02:23,385 [zookeeper.ZooCache] WARN : Zookeeper error, will retry org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /accumulo/d57cdc38-8ceb-4192-9da3-1ce2664df33b/tservers at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468) at org.apache.accumulo.fate.zookeeper.ZooCache$1.run(ZooCache.java:167) at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:130) at org.apache.accumulo.fate.zookeeper.ZooCache.getChildren(ZooCache.java:178) at org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:140) at org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:128) at org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:123) at org.apache.accumulo.core.client.impl.ServerClient.executeRaw(ServerClient.java:105) at org.apache.accumulo.core.client.impl.ServerClient.execute(ServerClient.java:71) at org.apache.accumulo.core.client.impl.ConnectorImpl.<init>(ConnectorImpl.java:64) at org.apache.accumulo.server.client.HdfsZooInstance.getConnector(HdfsZooInstance.java:154) at org.apache.accumulo.server.client.HdfsZooInstance.getConnector(HdfsZooInstance.java:149) at org.apache.accumulo.server.trace.TraceServer.<init>(TraceServer.java:185) at org.apache.accumulo.server.trace.TraceServer.main(TraceServer.java:260) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.accumulo.start.Main$1.run(Main.java:101) at java.lang.Thread.run(Thread.java:724) 2013-09-12 08:09:44,861 [server.Accumulo] INFO : tracer starting 2013-09-12 08:09:44,926 [server.Accumulo] INFO : Instance d57cdc38-8ceb-4192-9da3-1ce2664df33b 2013-09-12 08:09:44,929 [server.Accumulo] INFO : Data Version 5 2013-09-12 08:09:44,929 [server.Accumulo] INFO : Attempting to talk to zookeeper 2013-09-12 08:09:45,114 [server.Accumulo] INFO : Zookeeper connected and initialized, attemping to talk to HDFS 2013-09-12 08:09:45,130 [server.Accumulo] INFO : Connected to HDFS 2013-09-12 08:09:45,150 [server.Accumulo] INFO : gc.cycle.delay = 5m 2013-09-12 08:09:45,150 [server.Accumulo] INFO : gc.cycle.start = 30s but then it appeared to reconnect with Zookeeper. 2) I looked at the ACCUMULO-623 Jira ticket from the FATAL message above i.e., https://issues.apache.org/jira/browse/ACCUMULO-623 , but this Jira ticket indicates this issue is fixed in Accumulo 1.5.0 although that ticket references Hadoop 1.0.3, and Zookeeper 3.3.3 (I'm using Hadoop 1.2.1, and Zookeeper 3.4.5) I noticed that a fix was added to Hadoop 1.1 for a related Hadoop Jira ticket. 3) Next, I went to the Accumulo Jira page i.e., https://issues.apache.org/jira/browse/accumulo to look for this issue. Besides ACCUMULO-623, the following tickets are similar but not quite the same: - ACCUMULO-327 ( but I don't have any tablet servers to begin with to be killed) - ACCUMULO-1235 (I only have a the default !METADATA table) 4) Looked again at the User manual to see if there was information about configuring the tablet server, but didn't see anything. Any suggestions on what I should try next? Thanks, Pete
