Cluster is a 5 node VM based accumulo 1.5 / cdh 4.5 instance. Replication factor of 2.
It's a dev instance, so nothing critical (though I would like to not loose data there as it represents a week or so to re-ingest & process) It recently ran out of space during in ingest, so I cleared out some tables which were no longer being used. I didn't recover much of the free space, and really the total usage (~6TB seemed much higher than the number of entries (~50Billion) - knowing that none of the entries were especially large) -bash-4.1$ hadoop fs -du -h /accumulo/ 0 /accumulo/instance_id 58.5K /accumulo/lib 3.5G /accumulo/recovery 118.2G /accumulo/tables 0 /accumulo/version 2.5T /accumulo/wal -bash-4.1$ hadoop fs -du -h /accumulo/wal/ 495.2G /accumulo/wal/10.10.10.51:9997 541.3G /accumulo/wal/10.10.10.52:9997 515.7G /accumulo/wal/10.10.10.53:9997 474.3G /accumulo/wal/10.10.10.54:9997 562.5G /accumulo/wal/10.10.10.55:9997 As I mentioned, it's a dev cluster so it's entirely possible some wierd confluence of events happened previously to cause this - what I'm more concerned about is how to I recover that space. I'm not worried at this point about any information that might be in the WAL files. Accumulo itself has been restarted a few times for various reasons. The only notable log entry is in the tserver log file are the [tabletserver.TabletServer] WARN : Running low on memory occuring ~ 15 times a second. Tserver memory settings don't seem to impact this (8GB allocated to tservers, bloom filters are on, as are block cache (2GB), index cache (1GB), native memory maps (1GB) Otherwise I don't see anything out of norm in the master, monitor, gc, or tracer files (on master)
