Re: OOM only on one datacenter nodes

2020-04-06 Thread Reid Pinchback
;user@cassandra.apache.org" Cc: Reid Pinchback Subject: Re: OOM only on one datacenter nodes Message from External Sender We are using JRE and not JDK , hence not able to take heap dump . On Sun, 5 Apr 2020 at 19:21, Jeff Jirsa mailto:jji...@gmail.com>> wrote: Set the jvm flags to

Re: OOM only on one datacenter nodes

2020-04-05 Thread Jeff Jirsa
e memory. >>>> >>>> RP> As the problem is only happening in DC2, then there has to be a thing >>>> that is true in DC2 that isn’t true in DC1. A difference in hardware, a >>>> RP> difference in O/S version, a difference in networking confi

Re: OOM only on one datacenter nodes

2020-04-05 Thread Surbhi Gupta
vity, or a >> RP> difference in how repairs are handled. Somewhere, there is a >> difference. I’d start with focusing on that. >> >> RP> From: Erick Ramirez >> RP> Reply-To: "user@cassandra.apache.org" >> RP> Date: Saturday, April 4, 2020 at 8

Re: OOM only on one datacenter nodes

2020-04-05 Thread Jeff Jirsa
nce in O/S version, a difference in networking config or >> physical infrastructure, a difference in client-triggered activity, or a >> RP> difference in how repairs are handled. Somewhere, there is a >> difference. I’d start with focusing on that. >> >>

Re: OOM only on one datacenter nodes

2020-04-05 Thread Surbhi Gupta
erence in how repairs are handled. Somewhere, there is a > difference. I’d start with focusing on that. > > RP> From: Erick Ramirez > RP> Reply-To: "user@cassandra.apache.org" > RP> Date: Saturday, April 4, 2020 at 8:28 PM > RP> To: "user@cassandra.ap

Re: OOM only on one datacenter nodes

2020-04-05 Thread Alex Ott
, a difference in client-triggered activity, or a RP> difference in how repairs are handled. Somewhere, there is a difference. I’d start with focusing on that. RP> From: Erick Ramirez RP> Reply-To: "user@cassandra.apache.org" RP> Date: Saturday, April 4, 2020 at 8:28 P

Re: OOM only on one datacenter nodes

2020-04-04 Thread Reid Pinchback
difference in how repairs are handled. Somewhere, there is a difference. I’d start with focusing on that. From: Erick Ramirez Reply-To: "user@cassandra.apache.org" Date: Saturday, April 4, 2020 at 8:28 PM To: "user@cassandra.apache.org" Subject: Re: OOM only on one datacenter

Re: OOM only on one datacenter nodes

2020-04-04 Thread Erick Ramirez
With a lack of heapdump for you to analyse, my hypothesis is that your DC2 nodes are taking on traffic (from some client somewhere) but you're just not aware of it. The hints replay is just a side-effect of the nodes getting overloaded. To rule out my hypothesis in the first instance, my recommend

Re: OOM after a while during compacting

2018-04-05 Thread Nate McCall
> > > - Heap size is set to 8GB > - Using G1GC > - I tried moving the memtable out of the heap. It helped but I still got > an OOM last night > - Concurrent compactors is set to 1 but it still happens and also tried > setting throughput between 16 and 128, no changes. > That heap size is way to sm

Re: OOM after a while during compacting

2018-04-05 Thread Zsolt Pálmai
Yeah, they are pretty much unique but it's only a few requests per day so hitting all the nodes would be fine for now. 2018-04-05 15:43 GMT+02:00 Evelyn Smith : > Not sure if it differs for SASI Secondary Indexes but my understanding is > it’s a bad idea to use high cardinality columns for Second

Re: OOM after a while during compacting

2018-04-05 Thread Evelyn Smith
Not sure if it differs for SASI Secondary Indexes but my understanding is it’s a bad idea to use high cardinality columns for Secondary Indexes. Not sure what your data model looks like but I’d assume UUID would have very high cardinality. If that’s the case it pretty much guarantees any query

Re: OOM after a while during compacting

2018-04-05 Thread Zsolt Pálmai
Tried both (although with the biggest table) and the result is the same. I stumbled upon this jira issue: https://issues.apache. org/jira/browse/CASSANDRA-12662 Since the sasi indexes I use are only helping in debugging (for now) I dropped them and it seems the tables get compacted now (at least i

Re: OOM after a while during compacting

2018-04-05 Thread Evelyn Smith
Oh and second, are you attempting a major compact while you have all those pending compactions? Try letting the cluster catch up on compactions. Having that many pending is bad. If you have replication factor of 3 and quorum you could go node by node and disable binary, raise concurrent compac

Re: OOM after a while during compacting

2018-04-05 Thread Evelyn Smith
Probably a dumb question but it’s good to clarify. Are you compacting the whole keyspace or are you compacting tables one at a time? > On 5 Apr 2018, at 9:47 pm, Zsolt Pálmai wrote: > > Hi! > > I have a setup with 4 AWS nodes (m4xlarge - 4 cpu, 16gb ram, 1TB ssd each) > and when running the

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-07 Thread Shravan C
OOM. Thanks everyone of you. From: Jeff Jirsa Sent: Tuesday, March 7, 2017 1:19 PM To: user@cassandra.apache.org Subject: Re: OOM on Apache Cassandra on 30 Plus node at the same time On 2017-03-03 09:18 (-0800), Shravan Ch wrote: > > nodetool compactio

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-07 Thread Jeff Jirsa
On 2017-03-03 09:18 (-0800), Shravan Ch wrote: > > nodetool compactionstats -H > pending tasks: 3 > compaction typekeyspace table > completed totalunit progress > Compaction system hints > 28.

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-07 Thread Jeff Jirsa
On 2017-03-04 07:23 (-0800), "Thakrar, Jayesh" wrote: > LCS does not rule out frequent updates - it just says that there will be more > frequent compaction, which can potentially increase compaction activity > (which again can be throttled as needed). > But STCS will guarantee OOM when you h

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-06 Thread Eric Evans
On Fri, Mar 3, 2017 at 11:18 AM, Shravan Ch wrote: > More than 30 plus Cassandra servers in the primary DC went down OOM > exception below. What puzzles me is the scale at which it happened (at the > same minute). I will share some more details below. You'd be surprised; When it's the result of a

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-04 Thread Thakrar, Jayesh
s , "user@cassandra.apache.org" Subject: Re: OOM on Apache Cassandra on 30 Plus node at the same time I was looking at nodetool info across all nodes. Consistently JVM heap used is ~ 12GB and off heap is ~ 4-5GB. From: Thakrar, Jayesh Sent: Saturday, M

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-04 Thread Priyanka
Sent from my iPhone > On Mar 3, 2017, at 12:18 PM, Shravan Ch wrote: > > Hello, > > More than 30 plus Cassandra servers in the primary DC went down OOM exception > below. What puzzles me is the scale at which it happened (at the same > minute). I will share some more details below. > > Sy

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-04 Thread Shravan C
I was looking at nodetool info across all nodes. Consistently JVM heap used is ~ 12GB and off heap is ~ 4-5GB. From: Thakrar, Jayesh Sent: Saturday, March 4, 2017 9:23:01 AM To: Shravan C; Joaquin Casares; user@cassandra.apache.org Subject: Re: OOM on Apache

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-04 Thread Edward Capriolo
M when you have large datasets. > > Did you have a look at the offheap + onheap size of our jvm using > "nodetool -info" ? > > > > > > *From: *Shravan C > > *Date: *Friday, March 3, 2017 at 11:11 PM > *To: *Joaquin Casares >, " > user@cassand

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-04 Thread Thakrar, Jayesh
size of our jvm using "nodetool -info" ? From: Shravan C Date: Friday, March 3, 2017 at 11:11 PM To: Joaquin Casares , "user@cassandra.apache.org" Subject: Re: OOM on Apache Cassandra on 30 Plus node at the same time We run C* at 32 GB and all servers have 96GB RAM. We use

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-03 Thread Shravan C
We run C* at 32 GB and all servers have 96GB RAM. We use STCS . LCS is not an option for us as we have frequent updates. Thanks, Shravan From: Thakrar, Jayesh Sent: Friday, March 3, 2017 3:47:27 PM To: Joaquin Casares; user@cassandra.apache.org Subject: Re: OOM

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-03 Thread Thakrar, Jayesh
better and should be the default (my opinion) unless you want DTCS A good description of all three compactions is here - http://docs.scylladb.com/kb/compaction/ From: Joaquin Casares Date: Friday, March 3, 2017 at 11:34 AM To: Subject: Re: OOM on Apache Cassandra on 30 Plus node at the same

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-03 Thread Shravan C
From: Joaquin Casares Sent: Friday, March 3, 2017 11:34:58 AM To: user@cassandra.apache.org Subject: Re: OOM on Apache Cassandra on 30 Plus node at the same time Hello Shravan, Typically asynchronous requests are recommended over batch statements since batch statements will cause more work on

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-03 Thread Joaquin Casares
Hello Shravan, Typically asynchronous requests are recommended over batch statements since batch statements will cause more work on the coordinator node while individual requests, when using a TokenAwarePolicy, will hit a specific coordinator, perform a local disk seek, and return the requested in

Re: OOM under high write throughputs on 2.2.5

2016-05-24 Thread Bryan Cheng
Hi Zhiyan, Silly question but are you sure your heap settings are actually being applied? "697,236,904 (51.91%)" would represent a sub-2GB heap. What's the real memory usage for Java when this crash happens? Other thing to look into might be memtable_heap_space_in_mb, as it looks like you're usi

Re: OOM when Adding host

2015-08-10 Thread rock zhang
I logged the open files every 10 mins, last record is : lsof -p $cassadnraPID | wc -l 74728 lsof |wc-l 5887913 # this is a very large number, don't know why. After OOM the open file numbers back to few hundreds (lsof | wc -l ). On Aug 10, 2015, at 9:59 AM, rock zhang wrote: > My C

Re: OOM when Adding host

2015-08-10 Thread rock zhang
My Cassandra version is 2.1.4. Thanks Rock On Aug 10, 2015, at 9:52 AM, rock zhang wrote: > Hi All, > > Currently i have three hosts. The data is not balanced, one is 79G, another > two have 300GB. When I adding a new host, firstly I got "too many open files" > error, then i changed file op

Re: OOM and high SSTables count

2015-03-04 Thread J. Ryan Earl
We think it is this bug: https://issues.apache.org/jira/browse/CASSANDRA-8860 We're rolling a patch to beta before rolling it into production. On Wed, Mar 4, 2015 at 4:12 PM, graham sanderson wrote: > We can confirm a problem on 2.1.3 (sadly our beta sstable state obviously > did not match our

Re: OOM and high SSTables count

2015-03-04 Thread graham sanderson
We can confirm a problem on 2.1.3 (sadly our beta sstable state obviously did not match our production ones in some critical way) We have about 20k sstables on each of 6 nodes right now; actually a quick glance shows 15k of those are from OpsCenter, which may have something to do with beta/prod

Re: OOM and high SSTables count

2015-03-04 Thread daemeon reiydelle
Are you finding a correlation between the shards on the OOM DC1 nodes and the OOM DC2 nodes? Does your monitoring tool indicate that the DC1 nodes are using significantly more CPU (and memory) than the nodes that are NOT failing? I am leading you down the path to suspect that your sharding is givin

Re: OOM and high SSTables count

2015-03-04 Thread Patrick McFadin
What kind of disks are you running here? Are you getting a lot of GC before the OOM? Patrick On Wed, Mar 4, 2015 at 9:26 AM, Jan wrote: > HI Roni; > > You mentioned: > DC1 servers have 32GB of RAM and 10GB of HEAP. DC2 machines have 16GB of > RAM and 5GB HEAP. > > Best practices would be be to:

Re: OOM and high SSTables count

2015-03-04 Thread Jan
HI Roni;  You mentioned: DC1 servers have 32GB of RAM and 10GB of HEAP. DC2 machines have 16GB of RAM and 5GB HEAP. Best practices would be be to:a)  have a consistent type of node across both DC's.  (CPUs, Memory, Heap & Disk) b)  increase heap on DC2 servers to be  8GB for C* Heap  The leveled

Re: OOM at Bootstrap Time

2014-10-30 Thread Maxime
Well, the answer was Secondary indexes. I am guessing they were corrupted somehow. I dropped all of them, cleanup, and now nodes are bootstrapping fine. On Thu, Oct 30, 2014 at 3:50 PM, Maxime wrote: > I've been trying to go through the logs but I can't say I understand very > well the details:

Re: OOM at Bootstrap Time

2014-10-30 Thread Maxime
I've been trying to go through the logs but I can't say I understand very well the details: INFO [SlabPoolCleaner] 2014-10-30 19:20:18,446 ColumnFamilyStore.java:856 - Enqueuing flush of loc: 7977119 (1%) on-heap, 0 (0%) off-heap DEBUG [SharedPool-Worker-22] 2014-10-30 19:20:18,446 AbstractSimple

Re: OOM at Bootstrap Time

2014-10-30 Thread Maxime
I will give a shot adding the logging. I've tried some experiments and I have no clue what could be happening anymore: I tried setting all nodes to a streamthroughput of 1 except 1, to see if somehow it was getting overloaded by too many streams coming in at once, nope. I went through the source

Re: OOM at Bootstrap Time

2014-10-29 Thread DuyHai Doan
Some ideas: 1) Put on DEBUG log on the joining node to see what is going on in details with the stream with 1500 files 2) Check the stream ID to see whether it's a new stream or an old one pending On Wed, Oct 29, 2014 at 2:21 AM, Maxime wrote: > Doan, thanks for the tip, I just read about it

Re: OOM at Bootstrap Time

2014-10-28 Thread Maxime
Doan, thanks for the tip, I just read about it this morning, just waiting for the new version to pop up on the debian datastax repo. Michael, I do believe you are correct in the general running of the cluster and I've reset everything. So it took me a while to reply, I finally got the SSTables do

Re: OOM at Bootstrap Time

2014-10-27 Thread DuyHai Doan
"Tombstones will be a very important issue for me since the dataset is very much a rolling dataset using TTLs heavily." --> You can try the new DateTiered compaction strategy ( https://issues.apache.org/jira/browse/CASSANDRA-6602) released on 2.1.1 if you have a time series data model to eliminate

Re: OOM at Bootstrap Time

2014-10-27 Thread Laing, Michael
Again, from our experience w 2.0.x: Revert to the defaults - you are manually setting heap way too high IMHO. On our small nodes we tried LCS - way too much compaction - switch all CFs to STCS. We do a major rolling compaction on our small nodes weekly during less busy hours - works great. Be su

Re: OOM at Bootstrap Time

2014-10-26 Thread Maxime
Hmm, thanks for the reading. I initially followed some (perhaps too old) maintenance scripts, which included weekly 'nodetool compact'. Is there a way for me to undo the damage? Tombstones will be a very important issue for me since the dataset is very much a rolling dataset using TTLs heavily. O

Re: OOM at Bootstrap Time

2014-10-26 Thread DuyHai Doan
"Should doing a major compaction on those nodes lead to a restructuration of the SSTables?" --> Beware of the major compaction on SizeTiered, it will create 2 giant SSTables and the expired/outdated/tombstone columns in this big file will be never cleaned since the SSTable will never get a chance t

Re: OOM at Bootstrap Time

2014-10-26 Thread Jonathan Haddad
If the issue is related to I/O, you're going to want to determine if you're saturated. Take a look at `iostat -dmx 1`, you'll see avgqu-sz (queue size) and svctm, (service time).The higher those numbers are, the most overwhelmed your disk is. On Sun, Oct 26, 2014 at 12:01 PM, DuyHai Doan wro

Re: OOM at Bootstrap Time

2014-10-26 Thread Maxime
Thank you very much for your reply. This is a deeper interpretation of the logs than I can do at the moment. Regarding 2) it's a good assumption on your part but in this case, non-obviously the loc table's primary key is actually not id, the scheme changed historically which has led to this odd na

Re: OOM at Bootstrap Time

2014-10-26 Thread DuyHai Doan
Hello Maxime Increasing the flush writers won't help if your disk I/O is not keeping up. I've had a look into the log file, below are some remarks: 1) There are a lot of SSTables on disk for some tables (events for example, but not only). I've seen that some compactions are taking up to 32 SSTab

Re: OOM at Bootstrap Time

2014-10-26 Thread Maxime
I've emailed you a raw log file of an instance of this happening. I've been monitoring more closely the timing of events in tpstats and the logs and I believe this is what is happening: - For some reason, C* decides to provoke a flush storm (I say some reason, I'm sure there is one but I have had

Re: OOM at Bootstrap Time

2014-10-25 Thread DuyHai Doan
Hello Maxime Can you put the complete logs and config somewhere ? It would be interesting to know what is the cause of the OOM. On Sun, Oct 26, 2014 at 3:15 AM, Maxime wrote: > Thanks a lot that is comforting. We are also small at the moment so I > definitely can relate with the idea of keepin

Re: OOM at Bootstrap Time

2014-10-25 Thread Maxime
Thanks a lot that is comforting. We are also small at the moment so I definitely can relate with the idea of keeping small and simple at a level where it just works. I see the new Apache version has a lot of fixes so I will try to upgrade before I look into downgrading. On Saturday, October 25, 2

Re: OOM at Bootstrap Time

2014-10-25 Thread Laing, Michael
Since no one else has stepped in... We have run clusters with ridiculously small nodes - I have a production cluster in AWS with 4GB nodes each with 1 CPU and disk-based instance storage. It works fine but you can see those little puppies struggle... And I ran into problems such as you observe...

Re: OOM(Java heap space) on start-up during commit log replaying

2014-08-13 Thread jivko donev
Graham, Thanks for the reply. As I stated in mine first mail increasing the heap size fixes the problem but I'm more interesting in figuring out the right properties for commitlog and memtable sizes when we need to keep the heap smaller.  Also I think we are not seeing CASSANDRA-7546 as I apply y

Re: OOM(Java heap space) on start-up during commit log replaying

2014-08-12 Thread graham sanderson
Agreed need more details; and just start by increasing heap because that may wells solve the problem. I have just observed (which makes sense when you think about it) while testing fix for https://issues.apache.org/jira/browse/CASSANDRA-7546, that if you are replaying a commit log which has a h

Re: OOM(Java heap space) on start-up during commit log replaying

2014-08-12 Thread jivko donev
Hi Robert, Thanks for your reply. The Cassandra version is 2.07. Is there some commonly used rule for determining the commitlog and memtables size depending on the heap size? What would be the main disadvantage when having smaller commitlog? On Tuesday, August 12, 2014 8:32 PM, Robert Coli wr

Re: OOM(Java heap space) on start-up during commit log replaying

2014-08-12 Thread Robert Coli
On Tue, Aug 12, 2014 at 9:34 AM, jivko donev wrote: > We have a node with commit log director ~4G. During start-up of the node > on commit log replaying the used heap space is constantly growing ending > with OOM error. > > The heap size and new heap size properties are - 1G and 256M. We are usin

Re: OOM while performing major compaction

2014-02-27 Thread Tupshin Harper
You are right that modifying your code to access two CFs is a hack, and not an ideal solution, but I think it should be pretty easy to implement, and would help you get out of this jam pretty quickly. Not saying you should go down that path, but if you lack better options, that would probably be my

Re: OOM while performing major compaction

2014-02-27 Thread Nish garg
Hello Tupshin, Yes all the data needs to be kept for just last 6 hours. Yes changing to new CF every 6 hours solves the compaction issue, but between the change we will have less than 6 hours of data. We can use CF1 and CF2 and truncate them one at a time every 6 hours in loop but we need some kin

Re: OOM while performing major compaction

2014-02-27 Thread Tupshin Harper
If you can programmatically roll over onto a new column family every 6 hours (or every day or other reasonable increment), and then just drop your existing column family after all the columns would have been expired, you could skip your compaction entirely. It was not clear to me from your descript

Re: OOM while performing major compaction

2014-02-27 Thread Nish garg
Thanks for replying. We are on Cassandra 1.2.9. We have time series like data structure where we need to keep only last 6 hours of data. So we expire data using expireddatetime column on column family and then we run expire script via cron to create tombstones. We don't use ttl yet and planning

Re: OOM while performing major compaction

2014-02-27 Thread Robert Coli
On Thu, Feb 27, 2014 at 11:09 AM, Nish garg wrote: > I am having OOM during major compaction on one of the column family where > there are lot of SStables (33000) to be compacted. Is there any other way > for them to be compacted? Any help will be really appreciated. > You can use user defined c

Re: OOM while performing major compaction

2014-02-27 Thread Edward Capriolo
One big downside about major compaction is that (depending on your cassandra version) the bloom filters size is pre-calculated. Thus cassandra needs enough heap for your existing 33 k+ sstables and the new large compacted one. In the past this happened to us when the compaction thread got hung up,

Re: OOM after some days related to RunnableScheduledFuture and meter persistance

2014-01-08 Thread Tyler Hobbs
I believe this is https://issues.apache.org/jira/browse/CASSANDRA-6358, which was fixed in 2.0.3. On Wed, Jan 8, 2014 at 7:15 AM, Desimpel, Ignace wrote: > Hi, > > > > On linux and cassandra version 2.0.2 I had an OOM after a heavy load and > then some (15 ) days of idle running (not exactly i

Re: OOM while reading key cache

2013-11-14 Thread Fabien Rousseau
A few month ago, we've got a similar issue on 1.2.6 : https://issues.apache.org/jira/browse/CASSANDRA-5706 But it has been fixed and did not encountered this issue anymore (we're also on 1.2.10) 2013/11/14 olek.stas...@gmail.com > Yes, as I wrote in first e-mail. When I removed key cache file

Re: OOM while reading key cache

2013-11-14 Thread olek.stas...@gmail.com
Yes, as I wrote in first e-mail. When I removed key cache file cassandra started without further problems. regards Olek 2013/11/13 Robert Coli : > > On Wed, Nov 13, 2013 at 12:35 AM, Tom van den Berge > wrote: >> >> I'm having the same problem, after upgrading from 1.2.3 to 1.2.10. >> >> I can r

Re: OOM while reading key cache

2013-11-13 Thread Robert Coli
On Wed, Nov 13, 2013 at 12:35 AM, Tom van den Berge wrote: > I'm having the same problem, after upgrading from 1.2.3 to 1.2.10. > > I can remember this was a bug that was solved in the 1.0 or 1.1 version > some time ago, but apparently it got back. > A workaround is to delete the contents of the s

Re: OOM while reading key cache

2013-11-13 Thread Tom van den Berge
I'm having the same problem, after upgrading from 1.2.3 to 1.2.10. I can remember this was a bug that was solved in the 1.0 or 1.1 version some time ago, but apparently it got back. A workaround is to delete the contents of the saved_caches directory before starting up. Tom On Tue, Nov 12, 201

Re: OOM while reading key cache

2013-11-11 Thread Aaron Morton
> -6 machines which 8gb RAM each and three 150GB disks each > -default heap configuration With 8GB the default heap is 2GB, try kicking that up to 4GB and a 600 to 800 MB new heap. I would guess for the data load you have 2GB is not enough. hope that helps. - Aaron Morton Ne

Re: OOM on replaying CommitLog with Cassandra 2.0.0

2013-11-07 Thread Fabian Seifert
Thanks i missed that issue but it solved our Problems. Regards Fabian From: Robert Coli Sent: ‎Tuesday‎, ‎5‎ ‎November‎ ‎2013 ‎19‎:‎12 To: user@cassandra.apache.org On Tue, Nov 5, 2013 at 12:06 AM, Fabian Seifert wrote: It keeps crashing with OOM on CommitLog replay: https:

Re: OOM on replaying CommitLog with Cassandra 2.0.0

2013-11-05 Thread Robert Coli
On Tue, Nov 5, 2013 at 12:06 AM, Fabian Seifert < fabian.seif...@frischmann.biz> wrote: > It keeps crashing with OOM on CommitLog replay: > https://issues.apache.org/jira/browse/CASSANDRA-6087 Probably this issue, fixed in 2.0.2. =Rob

Re: OOM when applying migrations

2012-09-20 Thread Tyler Hobbs
This should explain the schema issue in 1.0 that has been fixed in 1.1: http://www.datastax.com/dev/blog/the-schema-management-renaissance On Thu, Sep 20, 2012 at 10:17 AM, Jason Wee wrote: > Hi, when the heap is going more than 70% usage, you should be able to see > in the log, many flushing, o

Re: OOM when applying migrations

2012-09-20 Thread Jason Wee
Hi, when the heap is going more than 70% usage, you should be able to see in the log, many flushing, or reducing the row cache size down. Did you restart the cassandra daemon in the node that thrown OOM? On Thu, Sep 20, 2012 at 9:11 PM, Vanger wrote: > Hello, > We are trying to add new nodes to

Re: OOM opening bloom filter

2012-03-13 Thread Mick Semb Wever
> How much smaller did the BF get to ? After pending compactions completed today, i'm presuming fp_ratio is applied now to all sstables in the keyspace, it has gone from 20G+ down to 1G. This node is now running comfortably on Xmx4G (used heap ~1.5G). ~mck -- "A Microsoft Certified System

Re: OOM opening bloom filter

2012-03-13 Thread aaron morton
Thanks for the update. How much smaller did the BF get to ? A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 13/03/2012, at 8:24 AM, Mick Semb Wever wrote: > > It's my understanding then for this use case that bloom filters are of > l

Re: OOM opening bloom filter

2012-03-12 Thread Mick Semb Wever
> > > > It's my understanding then for this use case that bloom filters are of > > > > little importance and that i can Ok. To summarise our actions to get us out of this situation, in hope that it may help others one day, we did the following actions: 1) upgrade to 1.0.7 2) set fp_ratio=0.99

Re: OOM opening bloom filter

2012-03-12 Thread aaron morton
>>> It's my understanding then for this use case that bloom filters are of >>> little importance and that i can >> Yes. AFAIK there is only one position seek (that will use the bloom filter) at the start of a get_range_slice request. After that the iterators step over the rows in the -Data file

Re: OOM opening bloom filter

2012-03-11 Thread Mick Semb Wever
On Sun, 2012-03-11 at 15:36 -0700, Peter Schuller wrote: > Are you doing RF=1? That is correct. So are you calculations then :-) > > very small, <1k. Data from this cf is only read via hadoop jobs in batch > > reads of 16k rows at a time. > [snip] > > It's my understanding then for this use cas

Re: OOM opening bloom filter

2012-03-11 Thread Peter Schuller
> This particular cf has up to ~10 billion rows over 3 nodes. Each row is With default settings, 143 million keys roughly gives you 2^31 bits of bloom filter. Or put another way, you get about 1 GB of bloom filters per 570 million keys, if I'm not mistaken. If you have 10 billion rows, that should

Re: OOM opening bloom filter

2012-03-11 Thread Mick Semb Wever
On Sun, 2012-03-11 at 15:06 -0700, Peter Schuller wrote: > If it is legitimate use of memory, you *may*, depending on your > workload, want to adjust target bloom filter false positive rates: > >https://issues.apache.org/jira/browse/CASSANDRA-3497 This particular cf has up to ~10 billion row

Re: OOM opening bloom filter

2012-03-11 Thread Peter Schuller
> How did this this bloom filter get too big? Bloom filters grow with the amount of row keys you have. It is natural that they grow bigger over time. The question is whether there is something "wrong" with this node (for example, lots of sstables and disk space used due to compaction not running,

Re: OOM

2011-11-02 Thread Ben Coverston
Smells like ulimit. Have you been able to reproduce this with the C* process running as root? On Wed, Nov 2, 2011 at 8:12 AM, A J wrote: > java.lang.OutOfMemoryError: unable to create new native thread

Re: OOM on CompressionMetadata.readChunkOffsets(..)

2011-10-31 Thread Sylvain Lebresne
On Mon, Oct 31, 2011 at 2:58 PM, Sylvain Lebresne wrote: > On Mon, Oct 31, 2011 at 1:10 PM, Mick Semb Wever wrote: >> On Mon, 2011-10-31 at 13:05 +0100, Mick Semb Wever wrote: >>> Given a 60G sstable, even with 64kb chunk_length, to read just that one >>> sstable requires close to 8G free heap me

Re: OOM on CompressionMetadata.readChunkOffsets(..)

2011-10-31 Thread Sylvain Lebresne
On Mon, Oct 31, 2011 at 1:10 PM, Mick Semb Wever wrote: > On Mon, 2011-10-31 at 13:05 +0100, Mick Semb Wever wrote: >> Given a 60G sstable, even with 64kb chunk_length, to read just that one >> sstable requires close to 8G free heap memory... > > Arg, that calculation was a little off... >  (a lon

Re: OOM on CompressionMetadata.readChunkOffsets(..)

2011-10-31 Thread Mick Semb Wever
On Mon, 2011-10-31 at 13:05 +0100, Mick Semb Wever wrote: > Given a 60G sstable, even with 64kb chunk_length, to read just that one > sstable requires close to 8G free heap memory... Arg, that calculation was a little off... (a long isn't exactly 8K...) But you get my concern... ~mck -- "Whe

Re: OOM on CompressionMetadata.readChunkOffsets(..)

2011-10-31 Thread Mick Semb Wever
On Mon, 2011-10-31 at 09:07 +0100, Mick Semb Wever wrote: > The read pattern of these rows is always in bulk so the chunk_length > could have been much higher so to reduce memory usage (my largest > sstable is 61G). Isn't CompressionMetadata.readChunkOffsets(..) rather dangerous here? Given a 60

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread Sasha Dolgy
yes. each one corresponds with taking a node down for various reasons. i think more people should show their graphs. it's great. hoping Oberman has some.so we can see what his look like ,, On Thu, Jun 23, 2011 at 12:40 AM, Chris Burroughs wrote: > Do all of the reductions in Used on that g

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread Chris Burroughs
Do all of the reductions in Used on that graph correspond to node restarts? My Zabbix for reference: http://img194.imageshack.us/img194/383/2weekmem.png On 06/22/2011 06:35 PM, Sasha Dolgy wrote: > http://www.twitpic.com/5fdabn > http://www.twitpic.com/5fdbdg > > i do love a good graph. two of

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread Sasha Dolgy
http://www.twitpic.com/5fdabn http://www.twitpic.com/5fdbdg i do love a good graph. two of the weekly memory utilization graphs for 2 of the 4 servers from this ring... week 21 was a nice week ... the week before 0.8.0 went out proper. since then, bumped up to 0.8 and have seen a steady increase

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread Chris Burroughs
On 06/22/2011 08:53 AM, Sasha Dolgy wrote: > Yes ... this is because it was the OS that killed the process, and > wasn't related to Cassandra "crashing". Reviewing our monitoring, we > saw that memory utilization was pegged at 100% for days and days > before it was finally killed because 'apt' was

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread William Oberman
The CLI is posted, I assume that's the defaults (I didn't touch anything). The machines basically just run cassandra (and standard Centos5 background stuff). will On Wed, Jun 22, 2011 at 9:49 AM, Jake Luciani wrote: > Are you running with the default heap settings? what else is running on the >

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread Jake Luciani
Are you running with the default heap settings? what else is running on the boxes? On Wed, Jun 22, 2011 at 9:06 AM, William Oberman wrote: > I was wondering/I figured that /var/log/kern indicated the OS was killing > java (versus an internal OOM). > > The nodetool repair is interesting. My app

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread William Oberman
I was wondering/I figured that /var/log/kern indicated the OS was killing java (versus an internal OOM). The nodetool repair is interesting. My application never deletes, so I didn't bother running it. But, if that helps prevent OOMs as well, I'll add it to the crontab (plan A is still upgr

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread Sasha Dolgy
Yes ... this is because it was the OS that killed the process, and wasn't related to Cassandra "crashing". Reviewing our monitoring, we saw that memory utilization was pegged at 100% for days and days before it was finally killed because 'apt' was fighting for resource. At least, that's as far as

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread William Oberman
Well, I managed to run 50 days before an OOM, so any changes I make will take a while to test ;-) I've seen the GCInspector log lines appear periodically in my logs, but I didn't see a correlation with the crash. I'll read the instructions on how to properly do a rolling upgrade today, practice o

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread Sasha Dolgy
We had a similar problem a last month and found that the OS eventually in the end killed the Cassandra process on each of our nodes ... I've upgraded to 0.8.0 from 0.7.6-2 and have not had the problem since, but i do see consumption levels rising consistently from one day to the next on each node .

Re: OOM during restart

2011-06-21 Thread Jonathan Ellis
If you're OOMing on restart you WILL OOM during normal usage given heavy enough write load. Definitely adjust memtable thresholds down or, as Dominic suggests, upgrade to 0.8. On Tue, Jun 21, 2011 at 12:02 PM, Dominic Williams wrote: > Hi gabe, > What you need to do is the following: > 1. Adjust

Re: OOM during restart

2011-06-21 Thread Dominic Williams
Hi gabe, What you need to do is the following: 1. Adjust cassandra.yaml so when this node is starting up it is not contacted by other nodes e.g. set thrift to 9061 and storage to 7001 2. Copy your commit logs to tmp sub-folder e.g. commitLog/tmp 3. Copy a small number of commit logs back into m

Re: OOM during restart

2011-06-21 Thread aaron morton
AFAIK the node will not announce itself in the ring until the log replay is complete, so it will not get the schema update until after log replay. If possible i'd avoid making the schema change until you have solved this problem. My theory on OOM during log replay is that the high speed inserts

Re: OOM recovering failed node with many CFs

2011-05-26 Thread Jonathan Ellis
We've applied a fix to the 0.7 branch in https://issues.apache.org/jira/browse/CASSANDRA-2714. The patch probably applies to 0.7.6 as well. On Thu, May 26, 2011 at 11:36 AM, Flavio Baronti wrote: > I tried the manual copy you suggest, but the SystemTable.checkHealth() > function > complains it c

Re: OOM recovering failed node with many CFs

2011-05-26 Thread Flavio Baronti
I tried the manual copy you suggest, but the SystemTable.checkHealth() function complains it can't load the system files. Log follows, I will gather some more info and create a ticket as soon as possible. INFO [main] 2011-05-26 18:25:36,147 AbstractCassandraDaemon.java Logging initialized INFO

Re: OOM recovering failed node with many CFs

2011-05-26 Thread Jonathan Ellis
Sounds like a legitimate bug, although looking through the code I'm not sure what would cause a tight retry loop on migration announce/rectify. Can you create a ticket at https://issues.apache.org/jira/browse/CASSANDRA ? As a workaround, I would try manually copying the Migrations and Schema sstab

  1   2   >