Hi
I tried to use short circuit read to improve my hbase cluster MR scan
performance.
I have the following setting in hdfs-site.xml
dfs.client.read.shortcircuit set to true
dfs.block.local-path-access.user set to MR job runner.
The cluster is 1+4 node and each data node have 16cpu/4HDD, with all
hbase table major compact thus all data is local.
I have hoped that the short circuit read will improve the performance.
While the test result is that with short circuit read enabled, the
performance actually dropped 10-15%. Say scan a 50G table cost around 100s
instead of 90s.
My hadoop version is 1.1.1, any idea on this? Thx!
Best Regards,
Raymond Liu