Hello Henning,
since you reduced replication level to 1 in your one node cluster you
do not have any redundancy and thus you loose the self healing
capabilities of HDFS.
Try to work with at least 3 Worker nodes which gives you 3 fold
replication.
Cheers, Mirko
Von Samsung Mobile gesendet
-------- Ursprüngliche Nachricht --------
Von: Henning Blohm <henning.bl...@zfabrik.de>
Datum:17.05.2016 16:24 (GMT+01:00)
An: user@hadoop.apache.org
Cc:
Betreff: Curious: Corrupted HDFS self-healing?
Hi all,
after some 20h loading of data into Hbase (v1.0 on Hadoop 2.6.0), single
node, I noticed that Hadoop reported a corrupt file system. It says:
Status: CORRUPT
CORRUPT FILES: 1
CORRUPT BLOCKS: 1
The filesystem under path '/' is CORRUPT
and checking the details it says:
---
FSCK started by hb (auth:SIMPLE) from /127.0.0.1 for path
/hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/d23252e7c0854b6093e6468acf2dad38
at Tue May 17 15:54:03 CEST 2016
/hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/d23252e7c0854b6093e6468acf2dad38
2740218577 bytes, 11 block(s):
/hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/d23252e7c0854b6093e6468acf2dad38:
CORRUPT blockpool BP-130837870-192.168.178.29-1462900512452 block
blk_1073746166
MISSING 1 blocks of total size 268435456 B
0. BP-130837870-192.168.178.29-1462900512452:blk_1073746164_5344
len=268435456 Live_repl=1
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
1. BP-130837870-192.168.178.29-1462900512452:blk_1073746165_5345
len=268435456 Live_repl=1
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
2. BP-130837870-192.168.178.29-1462900512452:blk_1073746166_5346
len=268435456 MISSING!
3. BP-130837870-192.168.178.29-1462900512452:blk_1073746167_5347
len=268435456 Live_repl=1
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
4. BP-130837870-192.168.178.29-1462900512452:blk_1073746168_5348
len=268435456 Live_repl=1
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
5. BP-130837870-192.168.178.29-1462900512452:blk_1073746169_5349
len=268435456 Live_repl=1
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
6. BP-130837870-192.168.178.29-1462900512452:blk_1073746170_5350
len=268435456 Live_repl=1
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
7. BP-130837870-192.168.178.29-1462900512452:blk_1073746171_5351
len=268435456 Live_repl=1
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
8. BP-130837870-192.168.178.29-1462900512452:blk_1073746172_5352
len=268435456 Live_repl=1
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
9. BP-130837870-192.168.178.29-1462900512452:blk_1073746173_5353
len=268435456 Live_repl=1
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
10. BP-130837870-192.168.178.29-1462900512452:blk_1073746174_5354
len=55864017 Live_repl=1
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
---
(note 2.)
I did not try to repair using fsck. Instead restarting the node made
this problem go away:
---
FSCK started by hb (auth:SIMPLE) from /127.0.0.1 for path
/hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/d23252e7c0854b6093e6468acf2dad38
at Tue May 17 16:10:52 CEST 2016
/hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/d23252e7c0854b6093e6468acf2dad38
2740218577 bytes, 11 block(s): OK
0. BP-130837870-192.168.178.29-1462900512452:blk_1073746164_5344
len=268435456 Live_repl=1
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
1. BP-130837870-192.168.178.29-1462900512452:blk_1073746165_5345
len=268435456 Live_repl=1
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
2. BP-130837870-192.168.178.29-1462900512452:blk_1073746166_5346
len=268435456 Live_repl=1
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
3. BP-130837870-192.168.178.29-1462900512452:blk_1073746167_5347
len=268435456 Live_repl=1
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
4. BP-130837870-192.168.178.29-1462900512452:blk_1073746168_5348
len=268435456 Live_repl=1
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
5. BP-130837870-192.168.178.29-1462900512452:blk_1073746169_5349
len=268435456 Live_repl=1
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
6. BP-130837870-192.168.178.29-1462900512452:blk_1073746170_5350
len=268435456 Live_repl=1
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
7. BP-130837870-192.168.178.29-1462900512452:blk_1073746171_5351
len=268435456 Live_repl=1
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
8. BP-130837870-192.168.178.29-1462900512452:blk_1073746172_5352
len=268435456 Live_repl=1
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
9. BP-130837870-192.168.178.29-1462900512452:blk_1073746173_5353
len=268435456 Live_repl=1
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
10. BP-130837870-192.168.178.29-1462900512452:blk_1073746174_5354
len=55864017 Live_repl=1
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
Status: HEALTHY
---
I guess that means that the datanode reported the missing block now.
How is that possible? Is that an acceptable, expectable behavior?
Is there anything I can do to prevent this sort of problem?
Here is my hdfs config (substitute ${nosql.home} with the installation
folder and ${nosql.master} with localhost):
Any clarification would be great!
Thanks!
Henning
---
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file://${nosql.home}/data/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file://${nosql.home}/data/data</value>
</property>
<property>
<name>dfs.datanode.max.transfer.threads</name>
<value>4096</value>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.synconclose</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.sync.behind.writes</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.avoid.read.stale.datanode</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.avoid.write.stale.datanode</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.stale.datanode.interval</name>
<value>3000</value>
</property>
<!--
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/lib/seritrack/dn_socket</value>
</property>
<property>
<name>dfs.client.read.shortcircuit.buffer.size</name>
<value>131072</value>
</property>
-->
<property>
<name>dfs.block.size</name>
<value>268435456</value>
</property>
<property>
<name>ipc.server.tcpnodelay</name>
<value>true</value>
</property>
<property>
<name>ipc.client.tcpnodelay</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>64</value>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>8</value>
</property>
</configuration>
---
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org