The NullPointerException does not crash the node. It only makes it flap/go down a for short period and then it comes back up. I do not see anything abnormal in the system log, only that single error in the cassandra.log.
On Thu, Aug 19, 2010 at 11:42 PM, Peter Schuller < peter.schul...@infidyne.com> wrote: > > What is my "live set"? > > Sorry; that meant the "set of data acually live (i.e., not garbage) in > the heap". In other words, the amount of memory truly "used". > > > Is the system CPU bound given the few statements > > below? This is from running 4 concurrent processes against the node...do > I > > need to throttle back the concurrent read/writers? > > > > I do all reads/writes as Quorum. (Replication factor of 3). > > With quorom and 0.6.4 I don't think unthrottled writes are expected to > cause a problem. > > > The memtable threshold is the default of 256. > > > > All caching is turned off. > > > > The database is pretty small, maybe a few million keys (2-3) in 4 CFs. > The > > key size is pretty small. Some of the rows are pretty fat though (fatter > > than I thought). I am saving secondary indexes in separate CFs and those > are > > the large rows that I think might be part of the problem. I will restart > > testing turning these off and see if I see any difference. > > > > Would an extra fat row explain repeated OOM crashes in a row? I have > finally > > got the system to stabilize relatively and I even ran compaction on the > bad > > node without a problem (still no row size stats). > > Based on what you've said so far, the large rows are the only thing I > would suspect may be the cause. With the amount of data and keys you > say you have, you should definitely not be having memory issues with > an 8 gig heap as a direct result of the data size/key count. A few > million keys is not a lot at all; I still claim you should be able to > handle hundreds of millions at least, from the perspective of bloom > filters and such. > > So your plan to try it without these large rows is probably a good > idea unless some else has a better idea. > > You may want to consider trying 0.7 betas too since it has removed the > limitation with respect to large rows, assuming you do in fact want > these large rows (see the CassandraLimitations wiki page that was > posted earlier in this thread). > > > I now have several other nodes flapping with the following single error > in > > the cassandra.log > > Error: Exception thrown by the agent : java.lang.NullPointerException > > > > I assume this is an unrelated problem? > > Do you have a full stack trace? > > -- > / Peter Schuller >