Jobs, running on this cluster, print exceptions: java.util.concurrent.ExecutionException: java.net.SocketTimeoutException: Call to ds-hadoop-wk01p.tcsbank.ru/10.218.64.11:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.218.64.14:38621 remote= ds-hadoop-wk01p.tcsbank.ru/10.218.64.11:60020]
at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1569) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1421) at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:739) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:708) at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:367) at ru.tcsbank.hbase.HBasePersonDao.getUsersBatch(HBasePersonDao.java:306) at ru.tcsbank.matching.PersonMatcher.performSolrRequest(PersonMatcher.java:153) at ru.tcsbank.matching.PersonMatcher.search(PersonMatcher.java:135) at ru.tcsbank.personmatcher.mr.PersonMatcherJob$MapClass.map(PersonMatcherJob.java:80) at ru.tcsbank.personmatcher.mr.PersonMatcherJob$MapClass.map(PersonMatcherJob.java:65) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) С уважением, Мезенцев Павел 2014-07-22 14:59 GMT+04:00 Павел Мезенцев <pa...@mezentsev.org>: > Hello all! > > We have a trouble with hbase > Our hadoop cluster has 4 nodes (plus 1 client node). > There are CHD 4.6 + CM 4.7 hadoop installed > Hadoop versions are: > - hadoop-hdfs : 2.0.0+1475 > - hadoop-0.20-mapreduce : 2.0.0+1475 > - hbase" : 0.94.6+132 > Hadoop and hBase configs are in attachment > > We have several tables in hbase with total volume of 2 Tb. > We run mapReduce ETL jobs and analytics queries over them. > > There are a lot of warnings like > - *The health test result for REGION_SERVER_READ_LATENCY has become bad: > The moving average of HDFS read latency is 162 millisecond(s) over the > previous 5 minute(s). Critical threshold: 100*. > - *The health test result for REGION_SERVER_SYNC_LATENCY has become bad: > The moving average of HDFS sync latency is 8.2 second(s) over the previous > 5 minute(s). Critical threshold: 5,000*. > *- HBase region health: 442 unhealthy regions * > *- HDFS_DATA_NODES_HEALTHY has become bad* > *- HBase Region Health Canary is running slowly **on the cluster* > > mapReduce jobs over hBase with random queries to hBase working very slowly > (job is completed on 20% after 18 hours versus 100% after 12 hours on > analogue cluster) > > Please help use to solve reasons of this alerts and speed up the cluster. > Could you give us a good advise, what shall we do? > > Cheers, > Mezentsev Pavel > >