On the to-do list for today. Is there a tool to aggregate all the JMX stats from all nodes? I mean, something a little more complete than nagios. Ian
On Fri, May 21, 2010 at 10:23 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > you should check the jmx stages I posted about > > On Fri, May 21, 2010 at 7:05 AM, Ian Soboroff <isobor...@gmail.com> wrote: > > Just an update. I rolled the memtable size back to 128MB. I am still > > seeing that the daemon runs for a while with reasonable heap usage, but > then > > the heap climbs up to the max (6GB in this case, should be plenty) and it > > starts GCing, without much getting cleared. The client catches lots of > > exceptions, where I wait 30 seconds and try again, with a new client if > > necessary, but it doesn't clear up. > > > > Could this be related to memory leak problems I've skimmed past on the > list > > here? > > > > It can't be that I'm creating rows a bit at a time... once I stick a web > > page into two CFs, it's over and done with for this application. I'm > just > > trying to get stuff loaded. > > > > Is there a limit to how much on-disk data a Cassandra daemon can manage? > Is > > there runtime overhead associated with stuff on disk? > > > > Ian > > > > On Thu, May 20, 2010 at 9:31 PM, Ian Soboroff <isobor...@gmail.com> > wrote: > >> > >> Excellent leads, thanks. cassandra.in.sh has a heap of 6GB, but I > didn't > >> realize that I was trying to float so many memtables. I'll poke > tomorrow > >> and report if it gets fixed. > >> Ian > >> > >> On Thu, May 20, 2010 at 10:40 AM, Jonathan Ellis <jbel...@gmail.com> > >> wrote: > >>> > >>> Some possibilities: > >>> > >>> You didn't adjust Cassandra heap size in cassandra.in.sh (1GB is too > >>> small) > >>> You're inserting at CL.ZERO (ROW-MUTATION-STAGE in tpstats will show > >>> large pending ops -- large = 100s) > >>> You're creating large rows a bit at a time and Cassandra OOMs when it > >>> tries to compact (the oom should usually be in the compaction thread) > >>> You have your 5 disks each with a separate data directory, which will > >>> allow up to 12 total memtables in-flight internally, and 12*256 is too > >>> much for the heap size you have (FLUSH-WRITER-STAGE in tpstats will > >>> show large pending ops -- large = more than 2 or 3) > >>> > >>> On Tue, May 18, 2010 at 6:24 AM, Ian Soboroff <isobor...@gmail.com> > >>> wrote: > >>> > I hope this isn't too much of a newbie question. I am using > Cassandra > >>> > 0.6.1 > >>> > on a small cluster of Linux boxes - 14 nodes, each with 8GB RAM and 5 > >>> > data > >>> > drives. The nodes are running HDFS to serve files within the > cluster, > >>> > but > >>> > at the moment the rest of Hadoop is shut down. I'm trying to load a > >>> > large > >>> > set of web pages (the ClueWeb collection, but more is coming) and my > >>> > Cassandra daemons keep dying. > >>> > > >>> > I'm loading the pages into a simple column family that lets me fetch > >>> > out > >>> > pages by an internal ID or by URL. The biggest thing in the row is > the > >>> > page > >>> > content, maybe 15-20k per page of raw HTML. There aren't a lot of > >>> > columns. > >>> > I tried Thrift, Hector, and the BMT interface, and at the moment I'm > >>> > doing > >>> > batch mutations over Thrift, about 2500 pages per batch, because that > >>> > was > >>> > fastest for me in testing. > >>> > > >>> > At this point, each Cassandra node has between 500GB and 1.5TB > >>> > according to > >>> > nodetool ring. Let's say I start the daemons up, and they all go > live > >>> > after > >>> > a couple minutes of scanning the tables. I then start my importer, > >>> > which is > >>> > a single Java process reading Clueweb bundles over HDFS, cutting them > >>> > up, > >>> > and sending the mutations to Cassandra. I only talk to one node at a > >>> > time, > >>> > switching to a new node when I get an exception. As the job runs > over > >>> > a few > >>> > hours, the Cassandra daemons eventually fall over, either with no > error > >>> > in > >>> > the log or reporting that they are out of heap. > >>> > > >>> > Each daemon is getting 6GB of RAM and has scads of disk space to play > >>> > with. > >>> > I've set the storage-conf.xml to take 256MB in a memtable before > >>> > flushing > >>> > (like the BMT case), and to do batch commit log flushes, and to not > >>> > have any > >>> > caching in the CFs. I'm sure I must be tuning something wrong. I > >>> > would > >>> > eventually like this Cassandra setup to serve a light request load > but > >>> > over > >>> > say 50-100 TB of data. I'd appreciate any help or advice you can > >>> > offer. > >>> > > >>> > Thanks, > >>> > Ian > >>> > > >>> > >>> > >>> > >>> -- > >>> Jonathan Ellis > >>> Project Chair, Apache Cassandra > >>> co-founder of Riptano, the source for professional Cassandra support > >>> http://riptano.com > >> > > > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >