Hi, I'm using HBase 0.94.12 above Hadoop 1.2.1 and I have one node for zookeeper, one node for a Namenode/Hmaster and three Datanode/Regionservers. All the machines are on Amazon EC2, instance m2.xlarge.
I set the replication at two, so I'm expecting if I kill a HregionServer/Datanode (for example by killing all java processes), all the regions on that node are recover on one of the other two alive HRegionservers. But when I kill the node, I lost the regions on it and, worst of all, if on that node there is .META. or -ROOT- table, the entire cluster is not working at all! If it could be helpfull, I load 500000 of rows in 'usertable' table with YCSB tool and these are the status 'simple' and /hadoop fsck /hbase output before/after the kill of the node: before: hbase(main):001:0> status 'simple' 3 live servers ip-10-235-11-139:60020 1385632293907 requestsPerSecond=0, numberOfOnlineRegions=1, usedHeapMB=57, maxHeapMB=14983 ip-10-253-29-220:60020 1385632293955 requestsPerSecond=0, numberOfOnlineRegions=2, usedHeapMB=74, maxHeapMB=14983 ip-10-253-29-249:60020 1385632294162 requestsPerSecond=0, numberOfOnlineRegions=1, usedHeapMB=1935, maxHeapMB=14983 0 dead servers Aggregate load: 0, regions: 4 FSCK started by ubuntu from /10.253.91.250 for path /hbase at Thu Nov 28 09:57:20 UTC 2013 ..................................Status: HEALTHY Total size: 2122147158 B Total dirs: 31 Total files: 34 (Files currently being written: 3) Total blocks (validated): 59 (avg. block size 35968595 B) (Total open file blocks (not validated): 2) Minimally replicated blocks: 59 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 2 Average block replication: 2.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 3 Number of racks: 1 FSCK ended at Thu Nov 28 09:57:20 UTC 2013 in 23 milliseconds The filesystem under path '/hbase' is HEALTHY ------------------------------------------------------------------------- ------------------------------------------------------------------------- and after (about 15 minutes): hbase(main):001:0> status 'simple' 2 live servers ip-10-235-11-139:60020 1385632293907 requestsPerSecond=0, numberOfOnlineRegions=1, usedHeapMB=63, maxHeapMB=14983 ip-10-253-29-220:60020 1385632293955 requestsPerSecond=0, numberOfOnlineRegions=2, usedHeapMB=117, maxHeapMB=14983 1 dead servers ip-10-253-29-249,60020,1385632294162 Aggregate load: 0, regions: 3 FSCK started by ubuntu from /10.253.91.250 for path /hbase at Thu Nov 28 10:13:29 UTC 2013 ....................Status: HEALTHY Total size: 948168097 B Total dirs: 27 Total files: 20 (Files currently being written: 3) Total blocks (validated): 29 (avg. block size 32695451 B) (Total open file blocks (not validated): 2) Minimally replicated blocks: 29 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 2 Average block replication: 2.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 2 Number of racks: 1 FSCK ended at Thu Nov 28 10:13:29 UTC 2013 in 7 milliseconds The filesystem under path '/hbase' is HEALTHY I hope to have been clear and to provide sufficiently information, or I can post the hbase-site.xml and hdfs-site.xml configuration. Thank you for your help! Andrea