Thanks Aaron! Here is the exception, is that the timeout between nodes? any parameter I can change to reduce timeout?
me.prettyprint.hector.api.exceptions.HectorTransportException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:33) at me.prettyprint.cassandra.model.CqlQuery$1.execute(CqlQuery.java:130) at me.prettyprint.cassandra.model.CqlQuery$1.execute(CqlQuery.java:100) at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103) at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:246) at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97) at me.prettyprint.cassandra.model.CqlQuery.execute(CqlQuery.java:99) at com.netseer.cassandra.cache.dao.CacheReader.getRows(CacheReader.java:267) at com.netseer.cassandra.cache.dao.CacheReader.getCache0(CacheReader.java:55) at com.netseer.cassandra.cache.dao.CacheDao.getCaches(CacheDao.java:85) at com.netseer.cassandra.cache.dao.CacheDao.getCache(CacheDao.java:71) at com.netseer.cassandra.cache.dao.CacheDao.getCache(CacheDao.java:149) at com.netseer.cassandra.cache.service.CacheServiceImpl.getCache(CacheServiceImpl.java:55) at com.netseer.cassandra.cache.service.CacheServiceImpl.getCache(CacheServiceImpl.java:28) at com.netseer.dsat.cache.CassandraDSATCacheImpl.get(CassandraDSATCacheImpl.java:62) at com.netseer.dsat.cache.CassandraDSATCacheImpl.getTimedValue(CassandraDSATCacheImpl.java:144) at com.netseer.dsat.serving.GenericCacheManager$4.call(GenericCacheManager.java:427) at com.netseer.dsat.serving.GenericCacheManager$4.call(GenericCacheManager.java:423) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql_query(Cassandra.java:1698) at org.apache.cassandra.thrift.Cassandra$Client.execute_cql_query(Cassandra.java:1682) at me.prettyprint.cassandra.model.CqlQuery$1.execute(CqlQuery.java:106) ... 21 more Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ... 31 more *and there is the tpstats* [cassy@s2.dsat4 ~]$ ~/bin/nodetool -h localhost tpstats Pool Name Active Pending Completed Blocked All time blocked ReadStage 3 3 414129625 0 0 RequestResponseStage 0 0 300591600 0 0 MutationStage 0 0 96585276 0 0 ReadRepairStage 0 0 94185465 0 0 ReplicateOnWriteStage 0 0 0 0 0 GossipStage 0 0 2684813 0 0 AntiEntropyStage 0 0 5436 0 0 MigrationStage 0 0 22 0 0 MemtablePostFlusher 0 0 3553 0 0 StreamStage 0 0 167 0 0 FlushWriter 0 0 3582 0 23 MiscStage 0 0 1163 0 0 AntiEntropySessions 0 0 399 0 0 InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 2746 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 17931 BINARY 0 READ 5185149 MUTATION 232317 REQUEST_RESPONSE 1317 On Sun, Apr 8, 2012 at 2:15 PM, aaron morton <aa...@thelastpickle.com>wrote: > You need to see if the timeout is from the client to the server, or > between the server nodes. > > If it's server side a TimedOutException will be thrown from thrift. Take a > look at the nodetool tpstats on the servers, you will probably see lots of > "Pending" tasks. Basically the cluster is overloaded. Consider: > > * check the IO, CPU, GC state on the servers. > * ensuring the data and requests are evenly spread around the cluster. > * reducing the number of columns read in a select. > > Cheers > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 6/04/2012, at 5:30 AM, Daning Wang wrote: > > > Hi all, > > > > We are using Hector and ofter we see lots of timeout exception in the > log, I know that the hector can failover to other node, but I want to > reduce the number of timeouts. > > > > any hector parameter I should change to reduce this error? > > > > also, on the server side, any kind of tunning need to do for the timeout? > > > > > > Thanks in advance. > > > > > > 12/04/04 15:13:20 ERROR > com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms > > 12/04/04 15:13:25 ERROR > me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN > TRIGGERED for host 10.28.78.123(10.28.78.123):9160 > > 12/04/04 15:13:25 ERROR > me.prettyprint.cassandra.connection.HConnectionManager: Pool state on > shutdown: > <ConcurrentCassandraClientPoolByHost>:{10.28.78.123(10.28.78.123):9160}; > IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 > > 12/04/04 15:13:44 ERROR > me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN > TRIGGERED for host 10.240.113.171(10.240.113.171):9160 > > 12/04/04 15:13:44 ERROR > me.prettyprint.cassandra.connection.HConnectionManager: Pool state on > shutdown: > <ConcurrentCassandraClientPoolByHost>:{10.240.113.171(10.240.113.171):9160}; > IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 > > 12/04/04 15:13:46 ERROR > me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN > TRIGGERED for host 10.28.78.123(10.28.78.123):9160 > > 12/04/04 15:13:46 ERROR > me.prettyprint.cassandra.connection.HConnectionManager: Pool state on > shutdown: > <ConcurrentCassandraClientPoolByHost>:{10.28.78.123(10.28.78.123):9160}; > IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 > > 12/04/04 15:13:46 ERROR > me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN > TRIGGERED for host 10.123.83.114(10.123.83.114):9160 > > 12/04/04 15:13:46 ERROR > me.prettyprint.cassandra.connection.HConnectionManager: Pool state on > shutdown: > <ConcurrentCassandraClientPoolByHost>:{10.123.83.114(10.123.83.114):9160}; > IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 > > 12/04/04 15:13:46 ERROR > me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN > TRIGGERED for host 10.6.115.239(10.6.115.239):9160 > > 12/04/04 15:13:46 ERROR > me.prettyprint.cassandra.connection.HConnectionManager: Pool state on > shutdown: > <ConcurrentCassandraClientPoolByHost>:{10.6.115.239(10.6.115.239):9160}; > IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 > > 12/04/04 15:13:49 ERROR > com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms > > 12/04/04 15:13:49 ERROR > me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN > TRIGGERED for host 10.120.205.48(10.120.205.48):9160 > > 12/04/04 15:13:49 ERROR > me.prettyprint.cassandra.connection.HConnectionManager: Pool state on > shutdown: > <ConcurrentCassandraClientPoolByHost>:{10.120.205.48(10.120.205.48):9160}; > IsActive?: true; Active: 3; Blocked: 0; Idle: 3; NumBeforeExhausted: 17 > > 12/04/04 15:13:50 ERROR > me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN > TRIGGERED for host 10.28.20.200(10.28.20.200):9160 > > 12/04/04 15:13:50 ERROR > me.prettyprint.cassandra.connection.HConnectionManager: Pool state on > shutdown: > <ConcurrentCassandraClientPoolByHost>:{10.28.20.200(10.28.20.200):9160}; > IsActive?: true; Active: 2; Blocked: 0; Idle: 4; NumBeforeExhausted: 18 > > 12/04/04 15:13:51 ERROR > com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms > >