Forgot this part of accumulo-env.sh:
HADOOP_PREFIX="$HADOOP_HOME" Also in the example below, I make the folder to log in /var/log/accumulo… however, in the accumulo-env.sh example I have it pointing to $accumulo_home/logs. Just change that to wherever your logs are stored. Also you’ll need to run ssh-keygen on the master before pushing the ssh keys to your nodes. (ssh-copy-id). Also hdfs user will proabably have read-only access to the Hadoop home folder… just make a folder called HDFS_USER_HOMEFOLDER/.ssh and give r/w/x permissions to hdfs on that folder (.ssh) only. Otherwise ssh-copy-id wont be able to update that folder. In regard to optimization and write ahead logs, that stuff is usually environment/application specific. If anyone has questions or comments on this installation plan let me know, I would love to know what you’re doing differently, and why. From: user-return-3599-CHARLES.H.OTT=leidos....@accumulo.apache.org [mailto:user-return-3599-CHARLES.H.OTT=leidos....@accumulo.apache.org] On Behalf Of Ott, Charles H. Sent: Thursday, January 16, 2014 2:50 PM To: user@accumulo.apache.org Subject: RE: accumulo startup issue: Accumulo not initialized, there is no instance id at /accumulo/instance_id Disclaimer, Not advocating this is the best approach, just what I’m currently doing, put this together pretty quick, but it should be mostly complete for settting up accumulo on cdh hdfs/zk I always do something like this first on CentOS: $ yum install –y ntpd openssh-clients unzip #setup ssh and ntpd as needed $install the jdk RPM # bash this to setup OS specifics echo "Disabling SELINUX for Optimal CDH Compatability..." sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config echo "Increasing uLimit, aka File Descripter/Handlers for all Users..." echo "# Adding Support for CDH" >> /etc/security/limits.conf echo "* - nofile 65536" >> /etc/security/limits.conf echo "Disabling IPv6..." echo "# Disable ipv6" >> /etc/sysctl.conf echo "net.ipv6.conf.all.disable_ipv6 = 1" >> /etc/sysctl.conf echo "net.ipv6.conf.default.disable_ipv6 = 1" >> /etc/sysctl.conf echo "Increasing Swapiness Factor to limit use of swap space." echo "# swappiness for accumulo" >> /etc/sysctl.conf echo "vm.swappiness = 10" >> /etc/sysctl.conf reboot and test OS/services/jdk version… then I usually extract Accumulo to /opt/Accumulo/accumulo-1.5.0 Make a sym link, /opt/accumulo/Accumulo-current -> ./accumulo-1.5.0 #make dirs. For Accumulo logs, where ever… mkdir /var/log/accumulo #let HDFS own all your Accumulo folders chown –R hdfs:hdfs /opt/accumulo chown –R hdfs:hdfs /var/log/accumulo #update the hdfs password for the next step user root: passwd hdfs #setup passwordless ssh (test using hdfs afterwards, should be able to ssh <node> w/o entering credentials) su –hdfs ssh-copy-id <for all tablet server nodes> #update your iptables #env vars ACCUMULO_HOME=/opt/accumulo/accumulo-1.5.0 JAVA_HOME=/usr/java/default (jdk7 in my last install worked fine) Settings for accumulo-env.sh in /conf: # cdh4 export HADOOP_HDFS_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-hdfs export HADOOP_MAPREDUCE_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce test -z "$HADOOP_CONF_DIR" && export HADOOP_CONF_DIR="$HADOOP_PREFIX/etc/hadoop" test -z "$JAVA_HOME" && export JAVA_HOME=/usr/java/default test -z "$ZOOKEEPER_HOME" && export ZOOKEEPER_HOME=/opt/cloudera/parcels/CDH/lib/zookeeper test -z "$ACCUMULO_LOG_DIR" && export ACCUMULO_LOG_DIR=$ACCUMULO_HOME/logs #update all files as appropriate in /opt/Accumulo/Accumulo-current/conf/* masters, monitor,slaves,tracers,gc,Accumulo-site.xml, Accumulo-env.sh #accumulo-site.xml <property> <name>general.classpaths</name> <value> $ACCUMULO_HOME/server/target/classes/, $ACCUMULO_HOME/lib/accumulo-server.jar, $ACCUMULO_HOME/core/target/classes/, $ACCUMULO_HOME/lib/accumulo-core.jar, $ACCUMULO_HOME/start/target/classes/, $ACCUMULO_HOME/lib/accumulo-start.jar, $ACCUMULO_HOME/fate/target/classes/, $ACCUMULO_HOME/lib/accumulo-fate.jar, $ACCUMULO_HOME/proxy/target/classes/, $ACCUMULO_HOME/lib/accumulo-proxy.jar, $ACCUMULO_HOME/lib/[^.].*.jar, $ZOOKEEPER_HOME/zookeeper[^.].*.jar, $HADOOP_CONF_DIR, $HADOOP_PREFIX/[^.].*.jar, $HADOOP_PREFIX/lib/[^.].*.jar, $HADOOP_HDFS_HOME/.*.jar, $HADOOP_HDFS_HOME/lib/.*.jar, $HADOOP_MAPREDUCE_HOME/.*.jar, $HADOOP_MAPREDUCE_HOME/lib/.*.jar </value> <description>Classpaths that accumulo checks for updates and class files. When using the Security Manager, please remove the ".../target/classes/" values. </description> </property> then of course, always run your Accumulo binaries/scripts using the HDFS account. I’m sure I’m missing a few steps here and there… $ACCUMULO_HOME/bin/accumulo init … $ACCUMULO_HOME/bin/start-all.sh From: user-return-3597-CHARLES.H.OTT=leidos....@accumulo.apache.org [mailto:user-return-3597-CHARLES.H.OTT=leidos....@accumulo.apache.org] On Behalf Of Sean Busbey Sent: Thursday, January 16, 2014 2:20 PM To: Accumulo User List Subject: Re: accumulo startup issue: Accumulo not initialized, there is no instance id at /accumulo/instance_id On Thu, Jan 16, 2014 at 1:14 PM, Kesten Broughton <kbrough...@21ct.com> wrote: "You should make sure to correct the maximum number of open files for the user that is running Accumulo." I have the following in all /etc/security/limits.conf in my accumulo cluster hdfs soft nofile 65536 hdfs hard nofile 65536 However, i see this for all nodes. WARN : Max files open on 10.0.11.208 is 32768, recommend 65536 Should it be a different user or something? 'the user that is running Accumulo' sudo hdfs hdfs$ bin/accumulo -u root so is hdfs or root the accumulo user? The user in question here is the one who starts the Accumulo server processes. In production environments this should be a user dedicated to Accumulo. FWIW, I usually name this user "accumulo". How do you start up Accumulo? a service script? running $ACCUMULO_HOME/bin/start-all.sh? something else?