Thanks Aaron, will seek help from hector team. On Tue, Apr 10, 2012 at 3:41 AM, aaron morton <aa...@thelastpickle.com>wrote:
> Caused by: java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) > ... 31 more > > This looks like a client side timeout to me. > > AFAIK it will use this > > http://rantav.github.com/hector//source/content/API/core/1.0-1/me/prettyprint/cassandra/service/CassandraHost.html#getCassandraThriftSocketTimeout() > > if it's > 0 otherwise the value of the CASSANDRA_THRIFT_SOCKET_TIMEOUT JVM > param > > otherwise 0 i think. > > Hector is one of the many things I am not an expert on. Try the hector > user list if you are still having problems. > > > > [cassy@s2.dsat4 ~]$ ~/bin/nodetool -h localhost tpstats > Pool Name Active Pending Completed Blocked > All time blocked > ReadStage 3 3 414129625 > 0 0 > > Looks fine. > > Hope that helps. > > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 10/04/2012, at 8:08 AM, Daning Wang wrote: > > Thanks Aaron! Here is the exception, is that the timeout between nodes? > any parameter I can change to reduce timeout? > > me.prettyprint.hector.api.exceptions.HectorTransportException: > org.apache.thrift.transport.TTransportException: > java.net.SocketTimeoutException: Read timed out > at > me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:33) > at > me.prettyprint.cassandra.model.CqlQuery$1.execute(CqlQuery.java:130) > at > me.prettyprint.cassandra.model.CqlQuery$1.execute(CqlQuery.java:100) > at > me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103) > at > me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:246) > at > me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97) > at > me.prettyprint.cassandra.model.CqlQuery.execute(CqlQuery.java:99) > at > com.netseer.cassandra.cache.dao.CacheReader.getRows(CacheReader.java:267) > at > com.netseer.cassandra.cache.dao.CacheReader.getCache0(CacheReader.java:55) > at > com.netseer.cassandra.cache.dao.CacheDao.getCaches(CacheDao.java:85) > at > com.netseer.cassandra.cache.dao.CacheDao.getCache(CacheDao.java:71) > at > com.netseer.cassandra.cache.dao.CacheDao.getCache(CacheDao.java:149) > at > com.netseer.cassandra.cache.service.CacheServiceImpl.getCache(CacheServiceImpl.java:55) > at > com.netseer.cassandra.cache.service.CacheServiceImpl.getCache(CacheServiceImpl.java:28) > at > com.netseer.dsat.cache.CassandraDSATCacheImpl.get(CassandraDSATCacheImpl.java:62) > at > com.netseer.dsat.cache.CassandraDSATCacheImpl.getTimedValue(CassandraDSATCacheImpl.java:144) > at > com.netseer.dsat.serving.GenericCacheManager$4.call(GenericCacheManager.java:427) > at > com.netseer.dsat.serving.GenericCacheManager$4.call(GenericCacheManager.java:423) > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: org.apache.thrift.transport.TTransportException: > java.net.SocketTimeoutException: Read timed out > at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) > at > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) > at > org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) > at > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) > at > org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) > at > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) > at > org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql_query(Cassandra.java:1698) > at > org.apache.cassandra.thrift.Cassandra$Client.execute_cql_query(Cassandra.java:1682) > at > me.prettyprint.cassandra.model.CqlQuery$1.execute(CqlQuery.java:106) > ... 21 more > Caused by: java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) > ... 31 more > > > *and there is the tpstats* > > [cassy@s2.dsat4 ~]$ ~/bin/nodetool -h localhost tpstats > Pool Name Active Pending Completed Blocked > All time blocked > ReadStage 3 3 414129625 > 0 0 > RequestResponseStage 0 0 300591600 > 0 0 > MutationStage 0 0 96585276 > 0 0 > ReadRepairStage 0 0 94185465 > 0 0 > ReplicateOnWriteStage 0 0 0 > 0 0 > GossipStage 0 0 2684813 > 0 0 > AntiEntropyStage 0 0 5436 > 0 0 > MigrationStage 0 0 22 > 0 0 > MemtablePostFlusher 0 0 3553 > 0 0 > StreamStage 0 0 167 > 0 0 > FlushWriter 0 0 3582 > 0 23 > MiscStage 0 0 1163 > 0 0 > AntiEntropySessions 0 0 399 > 0 0 > InternalResponseStage 0 0 0 > 0 0 > HintedHandoff 0 0 2746 > 0 0 > > Message type Dropped > RANGE_SLICE 0 > READ_REPAIR 17931 > BINARY 0 > READ 5185149 > MUTATION 232317 > REQUEST_RESPONSE 1317 > > > > > > > On Sun, Apr 8, 2012 at 2:15 PM, aaron morton <aa...@thelastpickle.com>wrote: > >> You need to see if the timeout is from the client to the server, or >> between the server nodes. >> >> If it's server side a TimedOutException will be thrown from thrift. Take >> a look at the nodetool tpstats on the servers, you will probably see lots >> of "Pending" tasks. Basically the cluster is overloaded. Consider: >> >> * check the IO, CPU, GC state on the servers. >> * ensuring the data and requests are evenly spread around the cluster. >> * reducing the number of columns read in a select. >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 6/04/2012, at 5:30 AM, Daning Wang wrote: >> >> > Hi all, >> > >> > We are using Hector and ofter we see lots of timeout exception in the >> log, I know that the hector can failover to other node, but I want to >> reduce the number of timeouts. >> > >> > any hector parameter I should change to reduce this error? >> > >> > also, on the server side, any kind of tunning need to do for the >> timeout? >> > >> > >> > Thanks in advance. >> > >> > >> > 12/04/04 15:13:20 ERROR >> com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms >> > 12/04/04 15:13:25 ERROR >> me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN >> TRIGGERED for host 10.28.78.123(10.28.78.123):9160 >> > 12/04/04 15:13:25 ERROR >> me.prettyprint.cassandra.connection.HConnectionManager: Pool state on >> shutdown: >> <ConcurrentCassandraClientPoolByHost>:{10.28.78.123(10.28.78.123):9160}; >> IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 >> > 12/04/04 15:13:44 ERROR >> me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN >> TRIGGERED for host 10.240.113.171(10.240.113.171):9160 >> > 12/04/04 15:13:44 ERROR >> me.prettyprint.cassandra.connection.HConnectionManager: Pool state on >> shutdown: >> <ConcurrentCassandraClientPoolByHost>:{10.240.113.171(10.240.113.171):9160}; >> IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 >> > 12/04/04 15:13:46 ERROR >> me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN >> TRIGGERED for host 10.28.78.123(10.28.78.123):9160 >> > 12/04/04 15:13:46 ERROR >> me.prettyprint.cassandra.connection.HConnectionManager: Pool state on >> shutdown: >> <ConcurrentCassandraClientPoolByHost>:{10.28.78.123(10.28.78.123):9160}; >> IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 >> > 12/04/04 15:13:46 ERROR >> me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN >> TRIGGERED for host 10.123.83.114(10.123.83.114):9160 >> > 12/04/04 15:13:46 ERROR >> me.prettyprint.cassandra.connection.HConnectionManager: Pool state on >> shutdown: >> <ConcurrentCassandraClientPoolByHost>:{10.123.83.114(10.123.83.114):9160}; >> IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 >> > 12/04/04 15:13:46 ERROR >> me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN >> TRIGGERED for host 10.6.115.239(10.6.115.239):9160 >> > 12/04/04 15:13:46 ERROR >> me.prettyprint.cassandra.connection.HConnectionManager: Pool state on >> shutdown: >> <ConcurrentCassandraClientPoolByHost>:{10.6.115.239(10.6.115.239):9160}; >> IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 >> > 12/04/04 15:13:49 ERROR >> com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms >> > 12/04/04 15:13:49 ERROR >> me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN >> TRIGGERED for host 10.120.205.48(10.120.205.48):9160 >> > 12/04/04 15:13:49 ERROR >> me.prettyprint.cassandra.connection.HConnectionManager: Pool state on >> shutdown: >> <ConcurrentCassandraClientPoolByHost>:{10.120.205.48(10.120.205.48):9160}; >> IsActive?: true; Active: 3; Blocked: 0; Idle: 3; NumBeforeExhausted: 17 >> > 12/04/04 15:13:50 ERROR >> me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN >> TRIGGERED for host 10.28.20.200(10.28.20.200):9160 >> > 12/04/04 15:13:50 ERROR >> me.prettyprint.cassandra.connection.HConnectionManager: Pool state on >> shutdown: >> <ConcurrentCassandraClientPoolByHost>:{10.28.20.200(10.28.20.200):9160}; >> IsActive?: true; Active: 2; Blocked: 0; Idle: 4; NumBeforeExhausted: 18 >> > 12/04/04 15:13:51 ERROR >> com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms >> >> > >