Wanted to see if I still would similar exceptions if I lowered the load by reducing the replication factor. I saw that in the log files a lot of 'GC parNew' messages and started a complete new run (empty database) with smaller JVM -Xmn1000M and with replication factor 1 (before it was 3). Currently running, 10 % user cpu, 3% iowait, 'iostat await' 60 on commitlog device, 'iostat await 50' on datafiles device, 'iostat svctm 3', 'vmstat si 0', 'vmstat so 0' I still get the DEAD/UP messages, less 'GC parNew' messages (some parNew take more than 1 sec), but the system recovers and continues. But the weird thing is that sometimes I still get java.util.concurrent.TimeoutException exceptions , as well on batch inserts (per 100 rows) and on single row inserts (with just one 64 char row key and one 4 char column name and 64 char column value). Rpc_timeout is 36000 !
I looked up what the other 3 machines were doing at that time. The 'received record...' and 'Document stored ...' are msgs of our application. Below a stack trace of exceptions occurring at the 'same time', just 3 seconds time difference, on machine 1. Machine 1 : 08:23:41 : the timeout exception stack trace as listed at end. Machine 2: Nothing special ========= .... 2011-09-30 08:23:20.888 Completed flushing /media/datadrive1/capd.cassandra.capd/dbdatafile/KsFullIdx/ForwardShortValues-g-149-Data.db (131582322 bytes) 2011-09-30 08:23:20.889 Writing Memtable-ForwardLabelValues@260608040(47867232/1983693805 serialized/live bytes, 1150959 ops) 2011-09-30 08:23:23.002 Document stored with count records : 55336 Part DB (ms) : 3666 ... 2011-09-30 08:23:33.467 Creating new commitlog segment ../../../../data/capd.cassandra.capd/dbcommitlog/CommitLog-1317363813467.log 2011-09-30 08:23:33.915 Document stored with count records : 9046 Part DB (ms) : 653 .... Machine 3: Is compacting. Also got a timeout exception about 36 seconds before the exception of machine 1 ========= 2011-09-30 08:22:34.458 Compacted to /media/datadrive1/capd.cassandra.capd/dbdatafile/KsFullIdx/ReverseStringValues-tmp-g-622-Data.db. 966,991,510 to 965,780,805 (~99% of original) bytes for 74,345 keys. Time: 208,416ms. 2011-09-30 08:22:37.474 Compacting large row KsFullIdx/ForwardStringValues:3237393737 (723544055 bytes) incrementally 2011-09-30 08:22:37.751 Document stored with count records : 13149 Part DB (ms) : 1613 2011-09-30 08:22:37.751 Document stored with count records : 13149 Time elapsed (ms) : 1748 2011-09-30 08:23:14.149 java.util.concurrent.TimeoutException 2011-09-30 08:23:14.149 at org.apache.cassandra.service.AbstractWriteResponseHandler.get(AbstractWriteResponseHandler.java:69) 2011-09-30 08:23:14.149 at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:169) 2011-09-30 08:23:14.150 at be.landc.services.search.server.db.baseserver.indexsearch.store.cassandra.CassandraStore.storeAnnotationsProxy(CassandraStore.java:1290) 2011-09-30 08:23:33.548 Received records to store with total count : 23277 2011-09-30 08:23:34.705 Received records to store with total count : 9816 2011-09-30 08:23:35.152 Received records to store with total count : 8646 2011-09-30 08:23:35.323 Received records to store with total count : 49888 2011-09-30 08:23:35.475 Document stored with count records : 23277 Part DB (ms) : 1926 .... 2011-09-30 08:23:39.627 Received records to store with total count : 8443 2011-09-30 08:23:40.081 Document stored with count records : 26862 Part DB (ms) : 3192 2011-09-30 08:23:40.081 Document stored with count records : 26862 Time elapsed (ms) : 3363 ... Machine 4: Nothing special, but also got the timeout exception for a single row insert at 08:22:30 ========= 2011-09-30 08:22:12.569 Compacted to /media/datadrive1/capd.cassandra.capd/dbdatafile/KsFullIdx/ForwardIntegerValues-tmp-g-150-Data.db. 586,039,224 to 586,036,960 (~99% of original) bytes for 36 keys. Time: 152,823ms. 2011-09-30 08:22:30.713 TimedOutException() 2011-09-30 08:22:30.714 at org.apache.cassandra.thrift.Cassandra$insert_result.read(Cassandra.java:15214) 2011-09-30 08:22:30.714 at org.apache.cassandra.thrift.Cassandra$Client.recv_insert(Cassandra.java:858) 2011-09-30 08:22:30.715 at org.apache.cassandra.thrift.Cassandra$Client.insert(Cassandra.java:830) .... 2011-09-30 08:23:18.902 Document stored with count records : 9186 Time elapsed (ms) : 2705 2011-09-30 08:23:19.240 Received records to store with total count : 8931 2011-09-30 08:23:19.898 flushing high-traffic column family ColumnFamilyStore(table='KsFullIdx', columnFamily='ForwardStringValues') 2011-09-30 08:23:19.898 Enqueuing flush of Memtable-ForwardStringValues@1365818075(53315582/2559687119 serialized/live bytes, 1158851 ops) 2011-09-30 08:23:19.899 Writing Memtable-ForwardStringValues@1365818075(53315582/2559687119 serialized/live bytes, 1158851 ops) 2011-09-30 08:23:21.725 Document stored with count records : 8931 Part DB (ms) : 2484 .... 2011-09-30 08:23:27.158 Document stored with count records : 12923 Time elapsed (ms) : 1126 2011-09-30 08:23:28.319 Completed flushing /media/datadrive1/capd.cassandra.capd/dbdatafile/KsFullIdx/ForwardStringValues-g-517-Data.db (136428318 bytes) 2011-09-30 08:23:30.719 Received records to store with total count : 41543 ... 2011-09-30 08:23:37.734 Received records to store with total count : 13653 2011-09-30 08:23:40.508 Document stored with count records : 13653 Part DB (ms) : 2774 ... Machine 1 : 08:23:41 : the timeout exception as listed below ========= First one is a StorageProxy.mutate of 100 rows. Second one is a single row insert using thrift for a simple and small row/column/value. Third is again the same batch of 100 rows. Fourth is another single row insert (small row/column/value). 2011-09-30 08:22:59.967 Received records to store with total count : 10224 2011-09-30 08:23:00.231 Document stored with count records : 22179 Part DB (ms) : 1120 2011-09-30 08:23:00.232 Document stored with count records : 22179 Time elapsed (ms) : 1362 2011-09-30 08:23:00.521 Document stored with count records : 10224 Part DB (ms) : 553 2011-09-30 08:23:00.521 Document stored with count records : 10224 Time elapsed (ms) : 604 2011-09-30 08:23:02.291 Received records to store with total count : 8169 2011-09-30 08:23:02.825 Document stored with count records : 8169 Part DB (ms) : 533 2011-09-30 08:23:02.825 Document stored with count records : 8169 Time elapsed (ms) : 604 2011-09-30 08:23:03.762 Received records to store with total count : 15874 2011-09-30 08:23:04.861 Document stored with count records : 15874 Part DB (ms) : 1098 2011-09-30 08:23:04.861 Document stored with count records : 15874 Time elapsed (ms) : 1257 2011-09-30 08:23:04.888 Received records to store with total count : 20536 2011-09-30 08:23:06.082 Received records to store with total count : 18046 2011-09-30 08:23:06.244 Received records to store with total count : 67283 2011-09-30 08:23:06.408 Received records to store with total count : 41107 2011-09-30 08:23:38.592 Compacted to /media/datadrive1/capd.cassandra.capd/dbdatafile/KsFullIdx/ReverseStringValues-tmp-g-622-Data.db. 958,447,377 to 958,447,377 (~100% of original) bytes for 74,860 keys. Time: 148,318ms. 2011-09-30 08:23:41.553 java.util.concurrent.TimeoutException 2011-09-30 08:23:41.554 at org.apache.cassandra.service.AbstractWriteResponseHandler.get(AbstractWriteResponseHandler.java:69) 2011-09-30 08:23:41.554 at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:169) 2011-09-30 08:23:41.554 at be.landc.services.search.server.db.baseserver.indexsearch.store.cassandra.CassandraStore.storeAnnotationsProxy(CassandraStore.java:1280) 2011-09-30 08:23:41.554 at be.landc.services.search.server.db.baseserver.indexsearch.store.cassandra.CassandraStore.storeAnnotations(CassandraStore.java:1018) 2011-09-30 08:23:41.555 at be.landc.services.search.server.db.baseserver.AnnotationStoreDb.storeAnnotations(AnnotationStoreDb.java:181) 2011-09-30 08:23:41.555 at be.landc.services.search.server.db.baseserver.AnnotationStoreDb.processDocument(AnnotationStoreDb.java:333) 2011-09-30 08:23:41.555 at be.landc.services.search.server.db.baseserver.AnnotationStoreDb.processDocument(AnnotationStoreDb.java:290) 2011-09-30 08:23:41.555 at be.landc.services.search.server.base.SearchServerDelegatingBase.processDocument(SearchServerDelegatingBase.java:93) 2011-09-30 08:23:41.556 at sun.reflect.GeneratedMethodAccessor133.invoke(Unknown Source) 2011-09-30 08:23:41.556 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2011-09-30 08:23:41.556 at java.lang.reflect.Method.invoke(Method.java:597) 2011-09-30 08:23:41.556 at be.landc.framework.service.server.shared.Server.execute(Server.java:134) 2011-09-30 08:23:41.557 at be.landc.framework.service.server.rmi.RMIServer.execute(RMIServer.java:80) 2011-09-30 08:23:41.557 at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source) 2011-09-30 08:23:41.557 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2011-09-30 08:23:41.557 at java.lang.reflect.Method.invoke(Method.java:597) 2011-09-30 08:23:41.558 at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) 2011-09-30 08:23:41.558 at sun.rmi.transport.Transport$1.run(Transport.java:159) 2011-09-30 08:23:41.558 at java.security.AccessController.doPrivileged(Native Method) 2011-09-30 08:23:41.558 at sun.rmi.transport.Transport.serviceCall(Transport.java:155) 2011-09-30 08:23:41.559 at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) 2011-09-30 08:23:41.559 at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) 2011-09-30 08:23:41.559 at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) 2011-09-30 08:23:41.559 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 2011-09-30 08:23:41.560 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 2011-09-30 08:23:41.560 at java.lang.Thread.run(Thread.java:662) 2011-09-30 08:23:42.285 TimedOutException() 2011-09-30 08:23:42.285 at org.apache.cassandra.thrift.Cassandra$insert_result.read(Cassandra.java:15214) 2011-09-30 08:23:42.286 at org.apache.cassandra.thrift.Cassandra$Client.recv_insert(Cassandra.java:858) 2011-09-30 08:23:42.286 at org.apache.cassandra.thrift.Cassandra$Client.insert(Cassandra.java:830) 2011-09-30 08:23:42.286 at be.landc.services.search.server.db.baseserver.indexsearch.store.cassandra.CassandraStore.storeAnnotationsProxy(CassandraStore.java:1236) 2011-09-30 08:23:42.286 at be.landc.services.search.server.db.baseserver.indexsearch.store.cassandra.CassandraStore.storeAnnotations(CassandraStore.java:1018) 2011-09-30 08:23:42.286 at be.landc.services.search.server.db.baseserver.AnnotationStoreDb.storeAnnotations(AnnotationStoreDb.java:181) 2011-09-30 08:23:42.286 at be.landc.services.search.server.db.baseserver.AnnotationStoreDb.processDocument(AnnotationStoreDb.java:333) 2011-09-30 08:23:42.287 at be.landc.services.search.server.db.baseserver.AnnotationStoreDb.processDocument(AnnotationStoreDb.java:290) 2011-09-30 08:23:42.287 at be.landc.services.search.server.base.SearchServerDelegatingBase.processDocument(SearchServerDelegatingBase.java:93) 2011-09-30 08:23:42.287 at sun.reflect.GeneratedMethodAccessor133.invoke(Unknown Source) 2011-09-30 08:23:42.287 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2011-09-30 08:23:42.287 at java.lang.reflect.Method.invoke(Method.java:597) 2011-09-30 08:23:42.287 at be.landc.framework.service.server.shared.Server.execute(Server.java:134) 2011-09-30 08:23:42.288 at be.landc.framework.service.server.rmi.RMIServer.execute(RMIServer.java:80) 2011-09-30 08:23:42.288 at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source) 2011-09-30 08:23:42.288 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2011-09-30 08:23:42.288 at java.lang.reflect.Method.invoke(Method.java:597) 2011-09-30 08:23:42.288 at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) 2011-09-30 08:23:42.288 at sun.rmi.transport.Transport$1.run(Transport.java:159) 2011-09-30 08:23:42.289 at java.security.AccessController.doPrivileged(Native Method) 2011-09-30 08:23:42.289 at sun.rmi.transport.Transport.serviceCall(Transport.java:155) 2011-09-30 08:23:42.289 at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) 2011-09-30 08:23:42.289 at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) 2011-09-30 08:23:42.289 at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) 2011-09-30 08:23:42.290 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 2011-09-30 08:23:42.290 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 2011-09-30 08:23:42.290 at java.lang.Thread.run(Thread.java:662) 2011-09-30 08:23:43.305 java.util.concurrent.TimeoutException 2011-09-30 08:23:43.305 at org.apache.cassandra.service.AbstractWriteResponseHandler.get(AbstractWriteResponseHandler.java:69) 2011-09-30 08:23:43.306 at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:169) 2011-09-30 08:23:43.306 at be.landc.services.search.server.db.baseserver.indexsearch.store.cassandra.CassandraStore.storeAnnotationsProxy(CassandraStore.java:1280) 2011-09-30 08:23:43.306 at be.landc.services.search.server.db.baseserver.indexsearch.store.cassandra.CassandraStore.storeAnnotations(CassandraStore.java:1018) 2011-09-30 08:23:43.306 at be.landc.services.search.server.db.baseserver.AnnotationStoreDb.storeAnnotations(AnnotationStoreDb.java:181) 2011-09-30 08:23:43.306 at be.landc.services.search.server.db.baseserver.AnnotationStoreDb.processDocument(AnnotationStoreDb.java:333) 2011-09-30 08:23:43.307 at be.landc.services.search.server.db.baseserver.AnnotationStoreDb.processDocument(AnnotationStoreDb.java:290) 2011-09-30 08:23:43.307 at be.landc.services.search.server.base.SearchServerDelegatingBase.processDocument(SearchServerDelegatingBase.java:93) 2011-09-30 08:23:43.307 at sun.reflect.GeneratedMethodAccessor133.invoke(Unknown Source) 2011-09-30 08:23:43.307 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2011-09-30 08:23:43.308 at java.lang.reflect.Method.invoke(Method.java:597) 2011-09-30 08:23:43.308 at be.landc.framework.service.server.shared.Server.execute(Server.java:134) 2011-09-30 08:23:43.308 at be.landc.framework.service.server.rmi.RMIServer.execute(RMIServer.java:80) 2011-09-30 08:23:43.308 at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source) 2011-09-30 08:23:43.309 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2011-09-30 08:23:43.309 at java.lang.reflect.Method.invoke(Method.java:597) 2011-09-30 08:23:43.309 at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) 2011-09-30 08:23:43.309 at sun.rmi.transport.Transport$1.run(Transport.java:159) 2011-09-30 08:23:43.309 at java.security.AccessController.doPrivileged(Native Method) 2011-09-30 08:23:43.310 at sun.rmi.transport.Transport.serviceCall(Transport.java:155) 2011-09-30 08:23:43.310 at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) 2011-09-30 08:23:43.310 at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) 2011-09-30 08:23:43.310 at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) 2011-09-30 08:23:43.310 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 2011-09-30 08:23:43.311 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 2011-09-30 08:23:43.311 at java.lang.Thread.run(Thread.java:662) 2011-09-30 08:23:44.385 TimedOutException() 2011-09-30 08:23:44.385 at org.apache.cassandra.thrift.Cassandra$insert_result.read(Cassandra.java:15214) 2011-09-30 08:23:44.386 at org.apache.cassandra.thrift.Cassandra$Client.recv_insert(Cassandra.java:858) 2011-09-30 08:23:44.386 at org.apache.cassandra.thrift.Cassandra$Client.insert(Cassandra.java:830) 2011-09-30 08:23:44.386 at be.landc.services.search.server.db.baseserver.indexsearch.store.cassandra.CassandraStore.storeAnnotationsProxy(CassandraStore.java:1242) 2011-09-30 08:23:44.386 at be.landc.services.search.server.db.baseserver.indexsearch.store.cassandra.CassandraStore.storeAnnotations(CassandraStore.java:1018) 2011-09-30 08:23:44.387 at be.landc.services.search.server.db.baseserver.AnnotationStoreDb.storeAnnotations(AnnotationStoreDb.java:181) 2011-09-30 08:23:44.387 at be.landc.services.search.server.db.baseserver.AnnotationStoreDb.processDocument(AnnotationStoreDb.java:333) 2011-09-30 08:23:44.387 at be.landc.services.search.server.db.baseserver.AnnotationStoreDb.processDocument(AnnotationStoreDb.java:290) 2011-09-30 08:23:44.387 at be.landc.services.search.server.base.SearchServerDelegatingBase.processDocument(SearchServerDelegatingBase.java:93) 2011-09-30 08:23:44.388 at sun.reflect.GeneratedMethodAccessor133.invoke(Unknown Source) 2011-09-30 08:23:44.388 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2011-09-30 08:23:44.388 at java.lang.reflect.Method.invoke(Method.java:597) 2011-09-30 08:23:44.388 at be.landc.framework.service.server.shared.Server.execute(Server.java:134) 2011-09-30 08:23:44.389 at be.landc.framework.service.server.rmi.RMIServer.execute(RMIServer.java:80) 2011-09-30 08:23:44.389 at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source) 2011-09-30 08:23:44.389 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2011-09-30 08:23:44.389 at java.lang.reflect.Method.invoke(Method.java:597) 2011-09-30 08:23:44.390 at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) 2011-09-30 08:23:44.390 at sun.rmi.transport.Transport$1.run(Transport.java:159) 2011-09-30 08:23:44.390 at java.security.AccessController.doPrivileged(Native Method) 2011-09-30 08:23:44.390 at sun.rmi.transport.Transport.serviceCall(Transport.java:155) 2011-09-30 08:23:44.391 at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) 2011-09-30 08:23:44.391 at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) 2011-09-30 08:23:44.391 at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) 2011-09-30 08:23:44.391 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 2011-09-30 08:23:44.392 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 2011-09-30 08:23:44.392 at java.lang.Thread.run(Thread.java:662) 2011-09-30 08:24:00.970 Received records to store with total count : 27928 2011-09-30 08:24:02.191 Document stored with count records : 27928 Part DB (ms) : 1220 2011-09-30 08:24:02.191 Document stored with count records : 27928 Time elapsed (ms) : 1339 2011-09-30 08:24:07.851 Received records to store with total count : 23548 2011-09-30 08:24:08.241 Received records to store with total count : 27686 2011-09-30 08:24:08.910 Document stored with count records : 23548 Part DB (ms) : 1058 2011-09-30 08:24:08.910 Document stored with count records : 23548 Time elapsed (ms) : 1328 2011-09-30 08:24:09.193 Received records to store with total count : 26117 2011-09-30 08:24:09.507 Document stored with count records : 27686 Part DB (ms) : 1265 2011-09-30 08:24:09.507 Document stored with count records : 27686 Time elapsed (ms) : 1495 2011-09-30 08:24:10.499 Received records to store with total count : 73102 2011-09-30 08:24:10.818 Received records to store with total count : 29409 2011-09-30 08:24:11.972 Document stored with count records : 29409 Part DB (ms) : 1153 2011-09-30 08:24:11.972 Document stored with count records : 29409 Time elapsed (ms) : 1383 2011-09-30 08:24:12.374 Received records to store with total count : 53034 2011-09-30 08:24:13.376 Document stored with count records : 73102 Part DB (ms) : 2876 2011-09-30 08:24:13.376 Document stored with count records : 73102 Time elapsed (ms) : 3303 2011-09-30 08:24:13.515 Received records to store with total count : 70652 2011-09-30 08:24:14.220 Received records to store with total count : 47999 2011-09-30 08:24:14.984 Document stored with count records : 53034 Part DB (ms) : 2609 2011-09-30 08:24:14.984 Document stored with count records : 53034 Time elapsed (ms) : 3133 2011-09-30 08:24:16.806 Creating new commitlog segment ../../../../data/capd.cassandra.capd/dbcommitlog/CommitLog-1317363856806.log 2011-09-30 08:24:17.153 Received records to store with total count : 9021 -----Original Message----- From: Desimpel, Ignace [mailto:ignace.desim...@nuance.com] Sent: dinsdag 27 september 2011 17:51 To: user@cassandra.apache.org Subject: RE: invalid column name length 0 No, brand new, started from scratch, no data at all. My setup get into trouble after a couple of hours since it then starts giving the Dead/up messages. I also get messages like '11 MUTATION messages dropped in server lifetime' Later on I got this type of exception also. -----Original Message----- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: dinsdag 27 september 2011 17:41 To: user@cassandra.apache.org Subject: Re: invalid column name length 0 Is this upgraded from an earlier version? On Tue, Sep 27, 2011 at 8:51 AM, Desimpel, Ignace <ignace.desim...@nuance.com> wrote: > Version 0.8.6. > > After an extreme load to 4 (embedded) cassandra servers with > replication factor 3 ( Ubuntu 10.4, dual six core, 64 bit, no swap, 1 > 15000 rpm commitlog disk, 1 15000 rpm datafile disk, ) I get a fatal > exception as listed below. > > No more messages were found after that. > > > > Probably other things are going wrong like the message "410 Could not > complete hinted handoff to /xxx.yyy.zzz.60", or the Dead/Up messages > (occurring long before this exception). > > But maybe this exception can point me in the right direction or even > point out some bug in Cassandra. > > > > Thanks, > > Ignace > > > > > > 2011-09-27 06:59:46,383 Compacting large row > KsFullIdx/ForwardStringValues:3237343034 (178032211 bytes) > incrementally > > 2011-09-27 07:00:09,738 GC for ParNew: 311 ms for 1 collections, > 8139389704 used; max is 33344716800 > > 2011-09-27 07:00:12,818 Compacting large row > KsFullIdx/ForwardStringValues:31363437 (1281862723 bytes) > incrementally > > 2011-09-27 07:02:16,025 Compacting large row > KsFullIdx/ForwardStringValues:31363438 (1623095072 bytes) > incrementally > > 2011-09-27 07:04:38,332 GC for ParNew: 534 ms for 1 collections, > 7811259472 used; max is 33344716800 > > 2011-09-27 07:04:52,803 Compacting large row > KsFullIdx/ForwardStringValues:3238313433 (1435774436 bytes) > incrementally > > 2011-09-27 07:06:57,160 Compacted to > /media/datadrive1/capd.cassandra.capd/dbdatafile/KsFullIdx/ForwardStringValues-tmp-g-542-Data.db. > 43,244,902,670 to 42,780,624,408 (~98% of original) bytes for 1,260 keys. > Time: 4,321,960ms. > > 2011-09-27 08:01:42,090 Saved KsFullIdx-ForwardStringValues-KeyCache > (572 > items) in 16 ms > > 2011-09-27 08:01:42,182 Saved KsFullIdx-ReverseStringValues-KeyCache > (25688 > items) in 63 ms > > 2011-09-27 08:18:13,078 InetAddress /xxx.yyy.zzz.62 is now dead. > > 2011-09-27 08:18:16,467 InetAddress /xxx.yyy.zzz.62 is now UP > > 2011-09-27 08:48:56,410 Could not complete hinted handoff to > /xxx.yyy.zzz.60 > > 2011-09-27 08:48:56,410 Enqueuing flush of > Memtable-HintsColumnFamily@2083796703(12097/196566 serialized/live > bytes, > 254 ops) > > 2011-09-27 08:48:56,411 Writing > Memtable-HintsColumnFamily@2083796703(12097/196566 serialized/live > bytes, > 254 ops) > > 2011-09-27 08:48:56,411 Nothing to compact in HintsColumnFamily; use > forceUserDefinedCompaction if you wish to force compaction of single > sstables (e.g. for tombstone collection) > > 2011-09-27 08:48:56,411 Finished hinted handoff of 254 rows to > endpoint > /xxx.yyy.zzz.60 > > 2011-09-27 08:48:56,490 Completed flushing > /media/datadrive1/capd.cassandra.capd/dbdatafile/system/HintsColumnFam > ily-g-10-Data.db > (25079 bytes) > > 2011-09-27 08:49:42,858 Started hinted handoff for endpoint > /xxx.yyy.zzz.62 > > 2011-09-27 12:01:42,100 Saved KsFullIdx-ForwardStringValues-KeyCache > (712 > items) in 27 ms > > 2011-09-27 12:01:42,182 Saved KsFullIdx-ReverseStringValues-KeyCache > (30742 > items) in 55 ms > > 2011-09-27 12:10:01,016 InetAddress /xxx.yyy.zzz.59 is now dead. > > 2011-09-27 12:10:02,272 InetAddress /xxx.yyy.zzz.59 is now UP > > 2011-09-27 12:17:34,596 Fatal exception in thread > Thread[HintedHandoff:1,5,RMI Runtime] > > java.io.IOError: > org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: > invalid column name length 0 > > at > org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSort > edMap.java:265) > > at > org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java: > 281) > > at > org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java: > 236) > > at > java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentS > kipListMap.java:1493) > > at > java.util.concurrent.ConcurrentSkipListMap.<init>(ConcurrentSkipListMa > p.java:1443) > > at > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn. > java:445) > > at > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn. > java:428) > > at > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn. > java:418) > > at > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn. > java:380) > > at > org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlock > Fetcher.getNextBlock(IndexedSliceReader.java:179) > > at > org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext( > IndexedSliceReader.java:121) > > at > org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext( > IndexedSliceReader.java:49) > > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIt > erator.java:140) > > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.ja > va:135) > > at > org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SS > TableSliceIterator.java:108) > > at > org.apache.commons.collections.iterators.CollatingIterator.set(Collati > ngIterator.java:283) > > at > org.apache.commons.collections.iterators.CollatingIterator.least(Colla > tingIterator.java:326) > > at > org.apache.commons.collections.iterators.CollatingIterator.next(Collat > ingIterator.java:230) > > at > org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterat > or.java:69) > > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIt > erator.java:140) > > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.ja > va:135) > > at > org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns( > SliceQueryFilter.java:116) > > at > org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(Quer > yFilter.java:142) > > at > org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFam > ilyStore.java:1427) > > at > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamily > Store.java:1304) > > at > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamily > Store.java:1261) > > at > org.apache.cassandra.db.HintedHandOffManager.sendRow(HintedHandOffMana > ger.java:155) > > at > org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(Hi > ntedHandOffManager.java:350) > > at > org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffM > anager.java:89) > > at > org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandO > ffManager.java:397) > > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30 > ) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu > tor.java:886) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. > java:908) > > at java.lang.Thread.run(Thread.java:662) > > Caused by: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: > invalid column name length 0 > > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer. > java:89) > > at > org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSort > edMap.java:261) > > ... 33 more > > 2011-09-27 12:17:53,291 Started hinted handoff for endpoint > /xxx.yyy.zzz.59 -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com