I was able to narrow the problem down to my accumulo-site.xml file.
I had the following:
<property>
<name>instance.dfs.uri</name>
<value>hdfs://localhost:9000/</value>
</property>
<property>
<name>instance.dfs.dir</name>
<value>/accumulo</value>
</property>
I changed the instance.dfs.dir value to be a full URI and my problem with
the "Mkdir" failures no longer happen, even on recovery bootup.
<property>
<name>instance.dfs.dir</name>
<value>hdfs://localhost:9000/accumulo</value>
</property>
Thanks for the suggestions everyone.
-Mike
On Tue, Jan 6, 2015 at 9:48 PM, Josh Elser <[email protected]> wrote:
> Is HDFS actually healthy? Have you checked the namenode status page
> (http://$hostname:50070 by default) to make sure the NN is up and out of
> safemode, expected number of DNs have reported in, hdfs reports available
> space, etc?
>
> Any other Hadoop details (version, etc) would be helpful too!
>
> Mike Atlas wrote:
>
>> Well, I caught the same error again after terminating my machine with a
>> hard stop - which isn't a normal way to do things but I fat-finger saved
>> an AMI image of it thinking I could boot up just fine afterward.
>>
>> The only workaround I could do to resolve it was to blow away the HDFS
>> /accumulo directory and re-init my accumulo instance again --- which is
>> fine for playing around, but I'm wondering what exactly is going on? I
>> don't want that to happen if I went to production and had real data.
>>
>> Thoughts on how to debug?
>>
>>
>> On Tue, Jan 6, 2015 at 10:40 AM, Keith Turner <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>
>>
>> On Mon, Jan 5, 2015 at 6:50 PM, Mike Atlas <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Hello,
>>
>> I'm running Accumulo 1.5.2, trying to test out the GeoMesa
>> <http://www.geomesa.org/2014/05/28/geomesa-quickstart/> family
>>
>> of spatio-temporal iterators using their quickstart
>> demonstration tool. I think I'm not making progress due to my
>> Accumulo setup, though, so can someone validate that all looks
>> good from here?
>>
>> start-all.sh output:
>>
>> hduser@accumulo:~$ $ACCUMULO_HOME/bin/start-all.sh
>> Starting monitor on localhost
>> Starting tablet servers .... done
>> Starting tablet server on localhost
>> 2015-01-05 21:37:18,523 [server.Accumulo] INFO : Attempting to
>> talk to zookeeper
>> 2015-01-05 21:37:18,772 [server.Accumulo] INFO : Zookeeper
>> connected and initialized, attemping to talk to HDFS
>> 2015-01-05 21:37:19,028 [server.Accumulo] INFO : Connected to HDFS
>> Starting master on localhost
>> Starting garbage collector on localhost
>> Starting tracer on localhost
>>
>> hduser@accumulo:~$
>>
>>
>> I do believe my HDFS is set up correctly:
>>
>> hduser@accumulo:/home/ubuntu/geomesa-quickstart$ hadoop fs -ls
>> /accumulo
>> Found 5 items
>> drwxrwxrwx - hduser supergroup 0 2014-12-10 01:04
>> /accumulo/instance_id
>> drwxrwxrwx - hduser supergroup 0 2015-01-05 21:22
>> /accumulo/recovery
>> drwxrwxrwx - hduser supergroup 0 2015-01-05 20:14
>> /accumulo/tables
>> drwxrwxrwx - hduser supergroup 0 2014-12-10 01:04
>> /accumulo/version
>> drwxrwxrwx - hduser supergroup 0 2014-12-10 01:05
>> /accumulo/wal
>>
>>
>> However, when I check the Accumulo monitor logs, I see these
>> errors post-startup:
>>
>> java.io.IOException: Mkdirs failed to create directory
>> /accumulo/recovery/15664488-bd10-4d8d-9584-f88d8595a07c/part-r-00000
>> java.io.IOException: Mkdirs failed to create directory
>> /accumulo/recovery/15664488-bd10-4d8d-9584-f88d8595a07c/part-r-00000
>> at org.apache.hadoop.io.MapFile$
>> Writer.<init>(MapFile.java:264)
>> at org.apache.hadoop.io.MapFile$
>> Writer.<init>(MapFile.java:103)
>> at org.apache.accumulo.server.
>> tabletserver.log.LogSorter$LogProcessor.writeBuffer(LogSorter.java:196)
>> at org.apache.accumulo.server.
>> tabletserver.log.LogSorter$LogProcessor.sort(LogSorter.java:166)
>> at org.apache.accumulo.server.
>> tabletserver.log.LogSorter$LogProcessor.process(LogSorter.java:89)
>> at org.apache.accumulo.server.zookeeper.
>> DistributedWorkQueue$1.run(DistributedWorkQueue.java:101)
>> at java.util.concurrent.
>> ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at java.util.concurrent.
>> ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at org.apache.accumulo.trace.
>> instrument.TraceRunnable.run(TraceRunnable.java:47)
>> at org.apache.accumulo.core.util.
>> LoggingRunnable.run(LoggingRunnable.java:34)
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>> I don't really understand - I started accumulo as the hduser,
>> which is the same user that has access to the HDFS directory
>> /accumulo/recovery, and it looks like the directory was created
>> actually, except for the last directory (part-r-0000):
>>
>> hduser@accumulo:~$ hadoop fs -ls /accumulo0/recovery/
>> Found 1 items
>> drwxr-xr-x - hduser supergroup 0 2015-01-05 22:11
>> /accumulo/recovery/87fb7aac-0274-4aea-8014-9d53dbbdfbbc
>>
>>
>> I'm not out of physical disk space:
>>
>> hduser@accumulo:~$ df -h
>> Filesystem Size Used Avail Use% Mounted on
>> /dev/xvda1 1008G 8.5G 959G 1% /
>>
>>
>> What could be going on here? Any ideas on something simple I
>> could have missed?
>>
>>
>> One possibility is that tserver where the exception occurred had bad
>> or missing config for hdfs. In this case the hadoop code may try to
>> create /accumulo/recovery/.../part-r-00000 in local fs, which would
>> fail.
>>
>>
>> Thanks,
>> Mike
>>
>>
>>
>>