Jobs, running on this cluster, print exceptions:

java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
Call to ds-hadoop-wk01p.tcsbank.ru/10.218.64.11:60020 failed on socket
timeout exception: java.net.SocketTimeoutException: 60000 millis timeout
while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.218.64.14:38621 remote=
ds-hadoop-wk01p.tcsbank.ru/10.218.64.11:60020]

        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:188)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1569)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1421)
        at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:739)
        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:708)
        at 
org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:367)
        at 
ru.tcsbank.hbase.HBasePersonDao.getUsersBatch(HBasePersonDao.java:306)
        at 
ru.tcsbank.matching.PersonMatcher.performSolrRequest(PersonMatcher.java:153)
        at ru.tcsbank.matching.PersonMatcher.search(PersonMatcher.java:135)
        at 
ru.tcsbank.personmatcher.mr.PersonMatcherJob$MapClass.map(PersonMatcherJob.java:80)
        at 
ru.tcsbank.personmatcher.mr.PersonMatcherJob$MapClass.map(PersonMatcherJob.java:65)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)





С уважением,
Мезенцев Павел


2014-07-22 14:59 GMT+04:00 Павел Мезенцев <pa...@mezentsev.org>:

> Hello all!
>
> We have a trouble with hbase
> Our hadoop cluster has 4 nodes (plus 1 client node).
> There are CHD 4.6 + CM 4.7 hadoop installed
> Hadoop versions are:
>  - hadoop-hdfs : 2.0.0+1475
>  - hadoop-0.20-mapreduce : 2.0.0+1475
>  - hbase" : 0.94.6+132
> Hadoop and hBase configs are in attachment
>
> We have several tables in hbase with total volume of 2 Tb.
> We run mapReduce ETL jobs and analytics queries over them.
>
> There are a lot of warnings like
> - *The health test result for REGION_SERVER_READ_LATENCY has become bad:
> The moving average of HDFS read latency is 162 millisecond(s) over the
> previous 5 minute(s). Critical threshold: 100*.
> - *The health test result for REGION_SERVER_SYNC_LATENCY has become bad:
> The moving average of HDFS sync latency is 8.2 second(s) over the previous
> 5 minute(s). Critical threshold: 5,000*.
> *- HBase region health: 442 unhealthy regions *
> *- HDFS_DATA_NODES_HEALTHY has become bad*
> *- HBase Region Health Canary is running slowly **on the cluster*
>
> mapReduce jobs over hBase with random queries to hBase working very slowly
> (job is completed on 20% after 18 hours versus 100% after 12 hours on
> analogue cluster)
>
> Please help use to solve reasons of this alerts and speed up the cluster.
> Could you give us a good advise, what shall we do?
>
> Cheers,
> Mezentsev Pavel
>
>

Reply via email to