Hello All,

We are using Cassandra 1.0.7 on AWS on mediums (that is 3.8G RAM, 1 Core),
running Ubuntu 12.04. We have three nodes in the cluster and we hit only
one node from our application. Thrift version is 0.6.1 (we changed from 0.8
because we thought there was a compatibility problem between thrift and
Cassandra ('old client' according to the output.log). We are still not sure
what version of thrift to use with Cassandra 1.0.7 (we are still getting
the same message regarding the 'old client'). I would appreciate any help
on that please.

Below, I am sharing the errors we are getting from the output.log file.
First three errors are not responsible for the crash, only the OOM error
is, but something seems to be really wrong there...

Error #1

ERROR 14:00:12,057 Thrift error occurred during processing of message.
org.apache.thrift.TException: Message length exceeded: 1970238464
at
org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393)
at
org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363)
at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:102)
at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:112)
at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:112)
at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:112)
at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:121)
at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:60)
at org.apache.cassandra.thrift.Mutation.read(Mutation.java:355)
at
org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandra.java:18966)
at
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3441)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Error #2

ERROR 14:03:48,004 Error occurred during processing of message.
java.lang.StringIndexOutOfBoundsException: String index out of range:
-2147418111
at java.lang.String.checkBounds(String.java:397)
at java.lang.String.<init>(String.java:442)
at
org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:339)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:210)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Error #3

ERROR 14:07:24,415 Thrift error occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Missing version in
readMessageBegin, old client?
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:213)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Error #4

ERROR 16:07:10,168 Thrift error occurred during processing of message.
org.apache.thrift.TException: Message length exceeded: 218104076
at
org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393)
at
org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:352)
at
org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:347)
at
org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandra.java:18958)
at
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3441)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /var/lib/cassandra/java_1341224307.hprof ...
INFO 16:07:18,882 GC for Copy: 886 ms for 1 collections, 2242700896 used;
max is 2670985216
Java HotSpot(TM) 64-Bit Server VM warning: record is too large
Heap dump file created [4429997807 bytes in 95.755 secs]
INFO 16:08:54,749 GC for ConcurrentMarkSweep: 1157 ms for 4 collections,
2246857528 used; max is 2670985216
WARN 16:08:54,761 Heap is 0.8412092715978552 full. You may need to reduce
memtable and/or cache sizes.
Cassandra will now flush up to the two largest memtables to free up memory.
Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't
want Cassandra to do this automatically
ERROR 16:08:54,761 Fatal exception in thread Thread[Thrift:446,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.<init>(HashMap.java:187)
at java.util.HashMap.<init>(HashMap.java:199)
at
org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandra.java:18953)
at
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3441)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
INFO 16:08:54,760 InetAddress /10.128.16.110 is now dead.
INFO 16:08:54,764 InetAddress /10.128.16.112 is now dead.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

First three errors appear a lot of times before error #4, which actually
causes the crash. 10.128.16.110 is the node our application hits. Although
the log suggests that 10.128.16.112 died, it did not. We ran 'nodetool
ring' on 10.128.16.112 and only 10.128.16.110 appeared to be down.

Proper hardware might solve some of our problems, but we need a fair
understanding before we move on. At the moment we cannot get a stable
cluster for more than 12 hours. After that, 10.128.16.110 dies and the
output.log has the same errors.

Any help would be much appreciated. Please, let me know if you need more
information in order to figure out what is going on.

Thank you in advance.

-- 
Kind Regards,

Vasilis

Reply via email to