;user@cassandra.apache.org"
Cc: Reid Pinchback
Subject: Re: OOM only on one datacenter nodes
Message from External Sender
We are using JRE and not JDK , hence not able to take heap dump .
On Sun, 5 Apr 2020 at 19:21, Jeff Jirsa
mailto:jji...@gmail.com>> wrote:
Set the jvm flags to
e memory.
>>>>
>>>> RP> As the problem is only happening in DC2, then there has to be a thing
>>>> that is true in DC2 that isn’t true in DC1. A difference in hardware, a
>>>> RP> difference in O/S version, a difference in networking confi
vity, or a
>> RP> difference in how repairs are handled. Somewhere, there is a
>> difference. I’d start with focusing on that.
>>
>> RP> From: Erick Ramirez
>> RP> Reply-To: "user@cassandra.apache.org"
>> RP> Date: Saturday, April 4, 2020 at 8
nce in O/S version, a difference in networking config or
>> physical infrastructure, a difference in client-triggered activity, or a
>> RP> difference in how repairs are handled. Somewhere, there is a
>> difference. I’d start with focusing on that.
>>
>>
erence in how repairs are handled. Somewhere, there is a
> difference. I’d start with focusing on that.
>
> RP> From: Erick Ramirez
> RP> Reply-To: "user@cassandra.apache.org"
> RP> Date: Saturday, April 4, 2020 at 8:28 PM
> RP> To: "user@cassandra.ap
, a difference in client-triggered activity, or a
RP> difference in how repairs are handled. Somewhere, there is a difference.
I’d start with focusing on that.
RP> From: Erick Ramirez
RP> Reply-To: "user@cassandra.apache.org"
RP> Date: Saturday, April 4, 2020 at 8:28 P
difference in how repairs are
handled. Somewhere, there is a difference. I’d start with focusing on that.
From: Erick Ramirez
Reply-To: "user@cassandra.apache.org"
Date: Saturday, April 4, 2020 at 8:28 PM
To: "user@cassandra.apache.org"
Subject: Re: OOM only on one datacenter
With a lack of heapdump for you to analyse, my hypothesis is that your DC2
nodes are taking on traffic (from some client somewhere) but you're just
not aware of it. The hints replay is just a side-effect of the nodes
getting overloaded.
To rule out my hypothesis in the first instance, my recommend
>
>
> - Heap size is set to 8GB
> - Using G1GC
> - I tried moving the memtable out of the heap. It helped but I still got
> an OOM last night
> - Concurrent compactors is set to 1 but it still happens and also tried
> setting throughput between 16 and 128, no changes.
>
That heap size is way to sm
Yeah, they are pretty much unique but it's only a few requests per day so
hitting all the nodes would be fine for now.
2018-04-05 15:43 GMT+02:00 Evelyn Smith :
> Not sure if it differs for SASI Secondary Indexes but my understanding is
> it’s a bad idea to use high cardinality columns for Second
Not sure if it differs for SASI Secondary Indexes but my understanding is it’s
a bad idea to use high cardinality columns for Secondary Indexes.
Not sure what your data model looks like but I’d assume UUID would have very
high cardinality.
If that’s the case it pretty much guarantees any query
Tried both (although with the biggest table) and the result is the same.
I stumbled upon this jira issue: https://issues.apache.
org/jira/browse/CASSANDRA-12662
Since the sasi indexes I use are only helping in debugging (for now) I
dropped them and it seems the tables get compacted now (at least i
Oh and second, are you attempting a major compact while you have all those
pending compactions?
Try letting the cluster catch up on compactions. Having that many pending is
bad.
If you have replication factor of 3 and quorum you could go node by node and
disable binary, raise concurrent compac
Probably a dumb question but it’s good to clarify.
Are you compacting the whole keyspace or are you compacting tables one at a
time?
> On 5 Apr 2018, at 9:47 pm, Zsolt Pálmai wrote:
>
> Hi!
>
> I have a setup with 4 AWS nodes (m4xlarge - 4 cpu, 16gb ram, 1TB ssd each)
> and when running the
OOM.
Thanks everyone of you.
From: Jeff Jirsa
Sent: Tuesday, March 7, 2017 1:19 PM
To: user@cassandra.apache.org
Subject: Re: OOM on Apache Cassandra on 30 Plus node at the same time
On 2017-03-03 09:18 (-0800), Shravan Ch wrote:
>
> nodetool compactio
On 2017-03-03 09:18 (-0800), Shravan Ch wrote:
>
> nodetool compactionstats -H
> pending tasks: 3
> compaction typekeyspace table
> completed totalunit progress
> Compaction system hints
> 28.
On 2017-03-04 07:23 (-0800), "Thakrar, Jayesh"
wrote:
> LCS does not rule out frequent updates - it just says that there will be more
> frequent compaction, which can potentially increase compaction activity
> (which again can be throttled as needed).
> But STCS will guarantee OOM when you h
On Fri, Mar 3, 2017 at 11:18 AM, Shravan Ch wrote:
> More than 30 plus Cassandra servers in the primary DC went down OOM
> exception below. What puzzles me is the scale at which it happened (at the
> same minute). I will share some more details below.
You'd be surprised; When it's the result of a
s
, "user@cassandra.apache.org"
Subject: Re: OOM on Apache Cassandra on 30 Plus node at the same time
I was looking at nodetool info across all nodes. Consistently JVM heap used is
~ 12GB and off heap is ~ 4-5GB.
From: Thakrar, Jayesh
Sent: Saturday, M
Sent from my iPhone
> On Mar 3, 2017, at 12:18 PM, Shravan Ch wrote:
>
> Hello,
>
> More than 30 plus Cassandra servers in the primary DC went down OOM exception
> below. What puzzles me is the scale at which it happened (at the same
> minute). I will share some more details below.
>
> Sy
I was looking at nodetool info across all nodes. Consistently JVM heap used is
~ 12GB and off heap is ~ 4-5GB.
From: Thakrar, Jayesh
Sent: Saturday, March 4, 2017 9:23:01 AM
To: Shravan C; Joaquin Casares; user@cassandra.apache.org
Subject: Re: OOM on Apache
M when you have large datasets.
>
> Did you have a look at the offheap + onheap size of our jvm using
> "nodetool -info" ?
>
>
>
>
>
> *From: *Shravan C >
> *Date: *Friday, March 3, 2017 at 11:11 PM
> *To: *Joaquin Casares >, "
> user@cassand
size of our jvm using "nodetool
-info" ?
From: Shravan C
Date: Friday, March 3, 2017 at 11:11 PM
To: Joaquin Casares , "user@cassandra.apache.org"
Subject: Re: OOM on Apache Cassandra on 30 Plus node at the same time
We run C* at 32 GB and all servers have 96GB RAM. We use
We run C* at 32 GB and all servers have 96GB RAM. We use STCS . LCS is not an
option for us as we have frequent updates.
Thanks,
Shravan
From: Thakrar, Jayesh
Sent: Friday, March 3, 2017 3:47:27 PM
To: Joaquin Casares; user@cassandra.apache.org
Subject: Re: OOM
better and should be the default (my opinion) unless you want DTCS
A good description of all three compactions is here -
http://docs.scylladb.com/kb/compaction/
From: Joaquin Casares
Date: Friday, March 3, 2017 at 11:34 AM
To:
Subject: Re: OOM on Apache Cassandra on 30 Plus node at the same
From: Joaquin Casares
Sent: Friday, March 3, 2017 11:34:58 AM
To: user@cassandra.apache.org
Subject: Re: OOM on Apache Cassandra on 30 Plus node at the same time
Hello Shravan,
Typically asynchronous requests are recommended over batch statements since
batch statements will cause more work on
Hello Shravan,
Typically asynchronous requests are recommended over batch statements since
batch statements will cause more work on the coordinator node while
individual requests, when using a TokenAwarePolicy, will hit a specific
coordinator, perform a local disk seek, and return the requested
in
Hi Zhiyan,
Silly question but are you sure your heap settings are actually being
applied? "697,236,904 (51.91%)" would represent a sub-2GB heap. What's the
real memory usage for Java when this crash happens?
Other thing to look into might be memtable_heap_space_in_mb, as it looks
like you're usi
I logged the open files every 10 mins, last record is :
lsof -p $cassadnraPID | wc -l
74728
lsof |wc-l
5887913 # this is a very large number, don't know why.
After OOM the open file numbers back to few hundreds (lsof | wc -l ).
On Aug 10, 2015, at 9:59 AM, rock zhang wrote:
> My C
My Cassandra version is 2.1.4.
Thanks
Rock
On Aug 10, 2015, at 9:52 AM, rock zhang wrote:
> Hi All,
>
> Currently i have three hosts. The data is not balanced, one is 79G, another
> two have 300GB. When I adding a new host, firstly I got "too many open files"
> error, then i changed file op
We think it is this bug:
https://issues.apache.org/jira/browse/CASSANDRA-8860
We're rolling a patch to beta before rolling it into production.
On Wed, Mar 4, 2015 at 4:12 PM, graham sanderson wrote:
> We can confirm a problem on 2.1.3 (sadly our beta sstable state obviously
> did not match our
We can confirm a problem on 2.1.3 (sadly our beta sstable state obviously did
not match our production ones in some critical way)
We have about 20k sstables on each of 6 nodes right now; actually a quick
glance shows 15k of those are from OpsCenter, which may have something to do
with beta/prod
Are you finding a correlation between the shards on the OOM DC1 nodes and
the OOM DC2 nodes? Does your monitoring tool indicate that the DC1 nodes
are using significantly more CPU (and memory) than the nodes that are NOT
failing? I am leading you down the path to suspect that your sharding is
givin
What kind of disks are you running here? Are you getting a lot of GC before
the OOM?
Patrick
On Wed, Mar 4, 2015 at 9:26 AM, Jan wrote:
> HI Roni;
>
> You mentioned:
> DC1 servers have 32GB of RAM and 10GB of HEAP. DC2 machines have 16GB of
> RAM and 5GB HEAP.
>
> Best practices would be be to:
HI Roni;
You mentioned: DC1 servers have 32GB of RAM and 10GB of HEAP. DC2 machines have
16GB of RAM and 5GB HEAP.
Best practices would be be to:a) have a consistent type of node across both
DC's. (CPUs, Memory, Heap & Disk)
b) increase heap on DC2 servers to be 8GB for C* Heap
The leveled
Well, the answer was Secondary indexes. I am guessing they were corrupted
somehow. I dropped all of them, cleanup, and now nodes are bootstrapping
fine.
On Thu, Oct 30, 2014 at 3:50 PM, Maxime wrote:
> I've been trying to go through the logs but I can't say I understand very
> well the details:
I've been trying to go through the logs but I can't say I understand very
well the details:
INFO [SlabPoolCleaner] 2014-10-30 19:20:18,446 ColumnFamilyStore.java:856
- Enqueuing flush of loc: 7977119 (1%) on-heap, 0 (0%) off-heap
DEBUG [SharedPool-Worker-22] 2014-10-30 19:20:18,446
AbstractSimple
I will give a shot adding the logging.
I've tried some experiments and I have no clue what could be happening
anymore:
I tried setting all nodes to a streamthroughput of 1 except 1, to see if
somehow it was getting overloaded by too many streams coming in at once,
nope.
I went through the source
Some ideas:
1) Put on DEBUG log on the joining node to see what is going on in details
with the stream with 1500 files
2) Check the stream ID to see whether it's a new stream or an old one
pending
On Wed, Oct 29, 2014 at 2:21 AM, Maxime wrote:
> Doan, thanks for the tip, I just read about it
Doan, thanks for the tip, I just read about it this morning, just waiting
for the new version to pop up on the debian datastax repo.
Michael, I do believe you are correct in the general running of the cluster
and I've reset everything.
So it took me a while to reply, I finally got the SSTables do
"Tombstones will be a very important issue for me since the dataset is very
much a rolling dataset using TTLs heavily."
--> You can try the new DateTiered compaction strategy (
https://issues.apache.org/jira/browse/CASSANDRA-6602) released on 2.1.1 if
you have a time series data model to eliminate
Again, from our experience w 2.0.x:
Revert to the defaults - you are manually setting heap way too high IMHO.
On our small nodes we tried LCS - way too much compaction - switch all CFs
to STCS.
We do a major rolling compaction on our small nodes weekly during less busy
hours - works great. Be su
Hmm, thanks for the reading.
I initially followed some (perhaps too old) maintenance scripts, which
included weekly 'nodetool compact'. Is there a way for me to undo the
damage? Tombstones will be a very important issue for me since the dataset
is very much a rolling dataset using TTLs heavily.
O
"Should doing a major compaction on those nodes lead to a restructuration
of the SSTables?" --> Beware of the major compaction on SizeTiered, it will
create 2 giant SSTables and the expired/outdated/tombstone columns in this
big file will be never cleaned since the SSTable will never get a chance t
If the issue is related to I/O, you're going to want to determine if
you're saturated. Take a look at `iostat -dmx 1`, you'll see avgqu-sz
(queue size) and svctm, (service time).The higher those numbers
are, the most overwhelmed your disk is.
On Sun, Oct 26, 2014 at 12:01 PM, DuyHai Doan wro
Thank you very much for your reply. This is a deeper interpretation of the
logs than I can do at the moment.
Regarding 2) it's a good assumption on your part but in this case,
non-obviously the loc table's primary key is actually not id, the scheme
changed historically which has led to this odd na
Hello Maxime
Increasing the flush writers won't help if your disk I/O is not keeping up.
I've had a look into the log file, below are some remarks:
1) There are a lot of SSTables on disk for some tables (events for example,
but not only). I've seen that some compactions are taking up to 32 SSTab
I've emailed you a raw log file of an instance of this happening.
I've been monitoring more closely the timing of events in tpstats and the
logs and I believe this is what is happening:
- For some reason, C* decides to provoke a flush storm (I say some reason,
I'm sure there is one but I have had
Hello Maxime
Can you put the complete logs and config somewhere ? It would be
interesting to know what is the cause of the OOM.
On Sun, Oct 26, 2014 at 3:15 AM, Maxime wrote:
> Thanks a lot that is comforting. We are also small at the moment so I
> definitely can relate with the idea of keepin
Thanks a lot that is comforting. We are also small at the moment so I
definitely can relate with the idea of keeping small and simple at a level
where it just works.
I see the new Apache version has a lot of fixes so I will try to upgrade
before I look into downgrading.
On Saturday, October 25, 2
Since no one else has stepped in...
We have run clusters with ridiculously small nodes - I have a production
cluster in AWS with 4GB nodes each with 1 CPU and disk-based instance
storage. It works fine but you can see those little puppies struggle...
And I ran into problems such as you observe...
Graham,
Thanks for the reply. As I stated in mine first mail increasing the heap size
fixes the problem but I'm more interesting in figuring out the right properties
for commitlog and memtable sizes when we need to keep the heap smaller.
Also I think we are not seeing CASSANDRA-7546 as I apply y
Agreed need more details; and just start by increasing heap because that may
wells solve the problem.
I have just observed (which makes sense when you think about it) while testing
fix for https://issues.apache.org/jira/browse/CASSANDRA-7546, that if you are
replaying a commit log which has a h
Hi Robert,
Thanks for your reply. The Cassandra version is 2.07. Is there some commonly
used rule for determining the commitlog and memtables size depending on the
heap size? What would be the main disadvantage when having smaller commitlog?
On Tuesday, August 12, 2014 8:32 PM, Robert Coli wr
On Tue, Aug 12, 2014 at 9:34 AM, jivko donev wrote:
> We have a node with commit log director ~4G. During start-up of the node
> on commit log replaying the used heap space is constantly growing ending
> with OOM error.
>
> The heap size and new heap size properties are - 1G and 256M. We are usin
You are right that modifying your code to access two CFs is a hack, and not
an ideal solution, but I think it should be pretty easy to implement, and
would help you get out of this jam pretty quickly. Not saying you should go
down that path, but if you lack better options, that would probably be my
Hello Tupshin,
Yes all the data needs to be kept for just last 6 hours. Yes changing to
new CF every 6 hours solves the compaction issue, but between the change we
will have less than 6 hours of data. We can use CF1 and CF2 and truncate
them one at a time every 6 hours in loop but we need some kin
If you can programmatically roll over onto a new column family every 6
hours (or every day or other reasonable increment), and then just drop your
existing column family after all the columns would have been expired, you
could skip your compaction entirely. It was not clear to me from your
descript
Thanks for replying.
We are on Cassandra 1.2.9.
We have time series like data structure where we need to keep only last 6
hours of data. So we expire data using expireddatetime column on column
family and then we run expire script via cron to create tombstones. We
don't use ttl yet and planning
On Thu, Feb 27, 2014 at 11:09 AM, Nish garg wrote:
> I am having OOM during major compaction on one of the column family where
> there are lot of SStables (33000) to be compacted. Is there any other way
> for them to be compacted? Any help will be really appreciated.
>
You can use user defined c
One big downside about major compaction is that (depending on your
cassandra version) the bloom filters size is pre-calculated. Thus cassandra
needs enough heap for your existing 33 k+ sstables and the new large
compacted one. In the past this happened to us when the compaction thread
got hung up,
I believe this is https://issues.apache.org/jira/browse/CASSANDRA-6358,
which was fixed in 2.0.3.
On Wed, Jan 8, 2014 at 7:15 AM, Desimpel, Ignace wrote:
> Hi,
>
>
>
> On linux and cassandra version 2.0.2 I had an OOM after a heavy load and
> then some (15 ) days of idle running (not exactly i
A few month ago, we've got a similar issue on 1.2.6 :
https://issues.apache.org/jira/browse/CASSANDRA-5706
But it has been fixed and did not encountered this issue anymore (we're
also on 1.2.10)
2013/11/14 olek.stas...@gmail.com
> Yes, as I wrote in first e-mail. When I removed key cache file
Yes, as I wrote in first e-mail. When I removed key cache file
cassandra started without further problems.
regards
Olek
2013/11/13 Robert Coli :
>
> On Wed, Nov 13, 2013 at 12:35 AM, Tom van den Berge
> wrote:
>>
>> I'm having the same problem, after upgrading from 1.2.3 to 1.2.10.
>>
>> I can r
On Wed, Nov 13, 2013 at 12:35 AM, Tom van den Berge wrote:
> I'm having the same problem, after upgrading from 1.2.3 to 1.2.10.
>
> I can remember this was a bug that was solved in the 1.0 or 1.1 version
> some time ago, but apparently it got back.
> A workaround is to delete the contents of the s
I'm having the same problem, after upgrading from 1.2.3 to 1.2.10.
I can remember this was a bug that was solved in the 1.0 or 1.1 version
some time ago, but apparently it got back.
A workaround is to delete the contents of the saved_caches directory before
starting up.
Tom
On Tue, Nov 12, 201
> -6 machines which 8gb RAM each and three 150GB disks each
> -default heap configuration
With 8GB the default heap is 2GB, try kicking that up to 4GB and a 600 to 800
MB new heap.
I would guess for the data load you have 2GB is not enough.
hope that helps.
-
Aaron Morton
Ne
Thanks i missed that issue but it solved our Problems.
Regards
Fabian
From: Robert Coli
Sent: Tuesday, 5 November 2013 19:12
To: user@cassandra.apache.org
On Tue, Nov 5, 2013 at 12:06 AM, Fabian Seifert
wrote:
It keeps crashing with OOM on CommitLog replay:
https:
On Tue, Nov 5, 2013 at 12:06 AM, Fabian Seifert <
fabian.seif...@frischmann.biz> wrote:
> It keeps crashing with OOM on CommitLog replay:
>
https://issues.apache.org/jira/browse/CASSANDRA-6087
Probably this issue, fixed in 2.0.2.
=Rob
This should explain the schema issue in 1.0 that has been fixed in 1.1:
http://www.datastax.com/dev/blog/the-schema-management-renaissance
On Thu, Sep 20, 2012 at 10:17 AM, Jason Wee wrote:
> Hi, when the heap is going more than 70% usage, you should be able to see
> in the log, many flushing, o
Hi, when the heap is going more than 70% usage, you should be able to see
in the log, many flushing, or reducing the row cache size down. Did you
restart the cassandra daemon in the node that thrown OOM?
On Thu, Sep 20, 2012 at 9:11 PM, Vanger wrote:
> Hello,
> We are trying to add new nodes to
> How much smaller did the BF get to ?
After pending compactions completed today, i'm presuming fp_ratio is
applied now to all sstables in the keyspace, it has gone from 20G+ down
to 1G. This node is now running comfortably on Xmx4G (used heap ~1.5G).
~mck
--
"A Microsoft Certified System
Thanks for the update.
How much smaller did the BF get to ?
A
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 13/03/2012, at 8:24 AM, Mick Semb Wever wrote:
>
> It's my understanding then for this use case that bloom filters are of
> l
> > > > It's my understanding then for this use case that bloom filters are of
> > > > little importance and that i can
Ok. To summarise our actions to get us out of this situation, in hope
that it may help others one day, we did the following actions:
1) upgrade to 1.0.7
2) set fp_ratio=0.99
>>> It's my understanding then for this use case that bloom filters are of
>>> little importance and that i can
>>
Yes.
AFAIK there is only one position seek (that will use the bloom filter) at the
start of a get_range_slice request. After that the iterators step over the rows
in the -Data file
On Sun, 2012-03-11 at 15:36 -0700, Peter Schuller wrote:
> Are you doing RF=1?
That is correct. So are you calculations then :-)
> > very small, <1k. Data from this cf is only read via hadoop jobs in batch
> > reads of 16k rows at a time.
> [snip]
> > It's my understanding then for this use cas
> This particular cf has up to ~10 billion rows over 3 nodes. Each row is
With default settings, 143 million keys roughly gives you 2^31 bits of
bloom filter. Or put another way, you get about 1 GB of bloom filters
per 570 million keys, if I'm not mistaken. If you have 10 billion
rows, that should
On Sun, 2012-03-11 at 15:06 -0700, Peter Schuller wrote:
> If it is legitimate use of memory, you *may*, depending on your
> workload, want to adjust target bloom filter false positive rates:
>
>https://issues.apache.org/jira/browse/CASSANDRA-3497
This particular cf has up to ~10 billion row
> How did this this bloom filter get too big?
Bloom filters grow with the amount of row keys you have. It is natural
that they grow bigger over time. The question is whether there is
something "wrong" with this node (for example, lots of sstables and
disk space used due to compaction not running,
Smells like ulimit. Have you been able to reproduce this with the C*
process running as root?
On Wed, Nov 2, 2011 at 8:12 AM, A J wrote:
> java.lang.OutOfMemoryError: unable to create new native thread
On Mon, Oct 31, 2011 at 2:58 PM, Sylvain Lebresne wrote:
> On Mon, Oct 31, 2011 at 1:10 PM, Mick Semb Wever wrote:
>> On Mon, 2011-10-31 at 13:05 +0100, Mick Semb Wever wrote:
>>> Given a 60G sstable, even with 64kb chunk_length, to read just that one
>>> sstable requires close to 8G free heap me
On Mon, Oct 31, 2011 at 1:10 PM, Mick Semb Wever wrote:
> On Mon, 2011-10-31 at 13:05 +0100, Mick Semb Wever wrote:
>> Given a 60G sstable, even with 64kb chunk_length, to read just that one
>> sstable requires close to 8G free heap memory...
>
> Arg, that calculation was a little off...
> (a lon
On Mon, 2011-10-31 at 13:05 +0100, Mick Semb Wever wrote:
> Given a 60G sstable, even with 64kb chunk_length, to read just that one
> sstable requires close to 8G free heap memory...
Arg, that calculation was a little off...
(a long isn't exactly 8K...)
But you get my concern...
~mck
--
"Whe
On Mon, 2011-10-31 at 09:07 +0100, Mick Semb Wever wrote:
> The read pattern of these rows is always in bulk so the chunk_length
> could have been much higher so to reduce memory usage (my largest
> sstable is 61G).
Isn't CompressionMetadata.readChunkOffsets(..) rather dangerous here?
Given a 60
yes. each one corresponds with taking a node down for various
reasons. i think more people should show their graphs. it's great.
hoping Oberman has some.so we can see what his look like ,,
On Thu, Jun 23, 2011 at 12:40 AM, Chris Burroughs
wrote:
> Do all of the reductions in Used on that g
Do all of the reductions in Used on that graph correspond to node restarts?
My Zabbix for reference: http://img194.imageshack.us/img194/383/2weekmem.png
On 06/22/2011 06:35 PM, Sasha Dolgy wrote:
> http://www.twitpic.com/5fdabn
> http://www.twitpic.com/5fdbdg
>
> i do love a good graph. two of
http://www.twitpic.com/5fdabn
http://www.twitpic.com/5fdbdg
i do love a good graph. two of the weekly memory utilization graphs
for 2 of the 4 servers from this ring... week 21 was a nice week ...
the week before 0.8.0 went out proper. since then, bumped up to 0.8
and have seen a steady increase
On 06/22/2011 08:53 AM, Sasha Dolgy wrote:
> Yes ... this is because it was the OS that killed the process, and
> wasn't related to Cassandra "crashing". Reviewing our monitoring, we
> saw that memory utilization was pegged at 100% for days and days
> before it was finally killed because 'apt' was
The CLI is posted, I assume that's the defaults (I didn't touch anything).
The machines basically just run cassandra (and standard Centos5 background
stuff).
will
On Wed, Jun 22, 2011 at 9:49 AM, Jake Luciani wrote:
> Are you running with the default heap settings? what else is running on the
>
Are you running with the default heap settings? what else is running on the
boxes?
On Wed, Jun 22, 2011 at 9:06 AM, William Oberman
wrote:
> I was wondering/I figured that /var/log/kern indicated the OS was killing
> java (versus an internal OOM).
>
> The nodetool repair is interesting. My app
I was wondering/I figured that /var/log/kern indicated the OS was killing
java (versus an internal OOM).
The nodetool repair is interesting. My application never deletes, so I
didn't bother running it. But, if that helps prevent OOMs as well, I'll add
it to the crontab
(plan A is still upgr
Yes ... this is because it was the OS that killed the process, and
wasn't related to Cassandra "crashing". Reviewing our monitoring, we
saw that memory utilization was pegged at 100% for days and days
before it was finally killed because 'apt' was fighting for resource.
At least, that's as far as
Well, I managed to run 50 days before an OOM, so any changes I make will
take a while to test ;-) I've seen the GCInspector log lines appear
periodically in my logs, but I didn't see a correlation with the crash.
I'll read the instructions on how to properly do a rolling upgrade today,
practice o
We had a similar problem a last month and found that the OS eventually
in the end killed the Cassandra process on each of our nodes ... I've
upgraded to 0.8.0 from 0.7.6-2 and have not had the problem since, but
i do see consumption levels rising consistently from one day to the
next on each node .
If you're OOMing on restart you WILL OOM during normal usage given
heavy enough write load. Definitely adjust memtable thresholds down
or, as Dominic suggests, upgrade to 0.8.
On Tue, Jun 21, 2011 at 12:02 PM, Dominic Williams
wrote:
> Hi gabe,
> What you need to do is the following:
> 1. Adjust
Hi gabe,
What you need to do is the following:
1. Adjust cassandra.yaml so when this node is starting up it is not
contacted by other nodes e.g. set thrift to 9061 and storage to 7001
2. Copy your commit logs to tmp sub-folder e.g. commitLog/tmp
3. Copy a small number of commit logs back into m
AFAIK the node will not announce itself in the ring until the log replay is
complete, so it will not get the schema update until after log replay. If
possible i'd avoid making the schema change until you have solved this problem.
My theory on OOM during log replay is that the high speed inserts
We've applied a fix to the 0.7 branch in
https://issues.apache.org/jira/browse/CASSANDRA-2714. The patch
probably applies to 0.7.6 as well.
On Thu, May 26, 2011 at 11:36 AM, Flavio Baronti
wrote:
> I tried the manual copy you suggest, but the SystemTable.checkHealth()
> function
> complains it c
I tried the manual copy you suggest, but the SystemTable.checkHealth() function
complains it can't load the system files. Log follows, I will gather some more
info and create a ticket as soon as possible.
INFO [main] 2011-05-26 18:25:36,147 AbstractCassandraDaemon.java Logging
initialized
INFO
Sounds like a legitimate bug, although looking through the code I'm
not sure what would cause a tight retry loop on migration
announce/rectify. Can you create a ticket at
https://issues.apache.org/jira/browse/CASSANDRA ?
As a workaround, I would try manually copying the Migrations and
Schema sstab
1 - 100 of 139 matches
Mail list logo