Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?

2010-08-22 Thread Peter Schuller
> [4] Is GC ConcurrentMarkSweep a Stop-The-World situation? Where the > JVM cannot do anything else? Hence then node is technically Down? > Correct? No; the concurrent mark/sweep phase runs concurrently with your application. CMS will cause a stop-the-world full pause it it fails to c

Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Gustavo Scudeler
Hi guys, We have a 6 node Cassandra Cluster under heavy utilization. We have been dealing a lot with garbage collector stop the world event, which can take up to 50 seconds in our nodes, in the meantime Cassandra Node is unresponsive, not even accepting new logins. Extra details: - Cassandra

Get information about GC pause (Stop the world) via JMX, it's possible ?

2019-06-27 Thread Ahmed Eljami
Hi, I want to know if it's possible to get information about GC pause duration (Stop the world) via JMX. Today, we get this information from gc.log with the JVM option XX:+PrintGCApplicationStoppedTime{color} Total time for which application threads were stopped: 0.0001273 seconds, Sto

Re: Nodes frozen in GC

2011-03-08 Thread Peter Schuller
15:21:23,524 GCInspector.java (line 128) GC for ConcurrentMarkSweep: 18052 ms, -997761672 reclaimed leaving 5796586088 This indicates that a concurrent mark/sweep GC took 18 seconds. That may or may not be a bit high for the heap size, but regardless, the CMS is not a stop-the-world pause. It involv

Re: Get information about GC pause (Stop the world) via JMX, it's possible ?

2019-06-27 Thread Dimo Velev
w if it's possible to get information about GC pause duration > (Stop the world) via JMX. > > Today, we get this information from gc.log with the JVM option > XX:+PrintGCApplicationStoppedTime{color} > > Total time for which application threads were stopped: 0.0001273 seconds, &g

RE: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Steinmaurer, Thomas
scudel...@gmail.com] Sent: Montag, 09. Oktober 2017 13:12 To: user@cassandra.apache.org Subject: Cassandra and G1 Garbage collector stop the world event (STW) Hi guys, We have a 6 node Cassandra Cluster under heavy utilization. We have been dealing a lot with garbage collector stop the world event,

Re: Reduce Cassandra GC

2013-06-15 Thread Takenori Sato
the world. But I think it is not stop the world, but only stop the new world. For example in case of Cassandra, a large number of in_memory_compaction_limit_in_mb can cause this. This is a limit when a compaction compacts(merges) rows of a key into the latest in memory. So this creates a large b

Re: Reduce Cassandra GC

2013-06-15 Thread Mohit Anchlia
t is created, > and which can not be promoted to Old Generation because it requires such a > large *contiguous* memory space that is unavailable at the point in time. > This is called promotion failure. So it has to wait until concurrent > collector collects a large enough space. Th

Re: Reduce Cassandra GC

2013-06-17 Thread Joel Samuelsson
can you take a heap dump at 2 diff points so that we can compare it? I can't access the machine at all during the stop-the-world freezes. Was that what you wanted me to try? > Uncomment the followings in "cassandra-env.sh". Done. Will post results as soon as I get a new stop-th

RE: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Steinmaurer, Thomas
Hi, my previously mentioned G1 bug does not seem to be related to your case Thomas From: Gustavo Scudeler [mailto:scudel...@gmail.com] Sent: Montag, 09. Oktober 2017 15:13 To: user@cassandra.apache.org Subject: Re: Cassandra and G1 Garbage collector stop the world event (STW) Hello, @kurt

Re: Reduce Cassandra GC

2013-06-15 Thread Takenori Sato
rent > collector collects a large enough space. Thus you experience stop the > world. But I think it is not stop the world, but only stop the new world. > > For example in case of Cassandra, a large number of > in_memory_compaction_limit_in_mb can cause this. This is a limit when a

Re: Reduce Cassandra GC

2013-06-17 Thread Joel Samuelsson
S -XX:CMSInitiatingOccupancyFraction=75" > JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly" > > I haven't changed anything in the environment config up until now. > > > Also can you take a heap dump at 2 diff points so that we can compare > it? &g

Re: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Gustavo Scudeler
tigation. > > > > Regards, > > Thomas > > > > *From:* Gustavo Scudeler [mailto:scudel...@gmail.com] > *Sent:* Montag, 09. Oktober 2017 13:12 > *To:* user@cassandra.apache.org > *Subject:* Cassandra and G1 Garbage collector stop the world event (STW) > >

Re: Get information about GC pause (Stop the world) via JMX, it's possible ?

2019-06-27 Thread Avinash Mandava
13:47 Ahmed Eljami, wrote: > >> Hi, >> >> I want to know if it's possible to get information about GC pause >> duration (Stop the world) via JMX. >> >> Today, we get this information from gc.log with the JVM option >> XX:+PrintGCApplicationStoppedTime{

Re: Reduce Cassandra GC

2013-06-15 Thread Takenori Sato
ion because it requires such a > large *contiguous* memory space that is unavailable at the point in time. > This is called promotion failure. So it has to wait until concurrent > collector collects a large enough space. Thus you experience stop the > world. But I

Re: Nodes frozen in GC

2011-03-08 Thread Peter Schuller
so hard? And to address this btw, although it has nothing to do with the problem being investigated in this thread: It's not about how *much* time is spent on memory management. That is of course relevant, but the issue here is to avoid long stop-the-world pauses. Even if you're avoiding d

Concurrent Mark Sweep taking 12 seconds

2011-05-16 Thread Héctor Izquierdo Seliva
Hi everyone. I see in the logs that Concurrent Mark Sweep is taking 12 seconds to do its stuff. Is this normal? There is no stop-the-world GC, it just takes 12 seconds. Configuration: 0.7.5 , 8GB Heap, 16GB machines. 7 * 64 MB memtables.

Re: Nodes frozen in GC

2011-03-08 Thread Peter Schuller
ncurrentTime while you're at it. But the key is to see what leads up to the long stop-the-world pause. -- / Peter Schuller

Re: Reduce Cassandra GC

2013-06-17 Thread Takenori Sato
VM_OPTS -XX:CMSInitiatingOccupancyFraction=75" >> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly" >> >> I haven't changed anything in the environment config up until now. >> >> > Also can you take a heap dump at 2 diff points so that we can

Re: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Chris Lohfink
n something >> called “humongous” allocations, spanning several G1 regions. If this >> happens in a very short very frequently and depending on your allocation >> rate in MB/s, a combination of the G1 bug and a small heap, might result >> going towards OOM. >> >>

Re: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread kurt greaves
Have you tried CMS with that sized heap? G1 is only really worthwhile with 24gb+ heap size, which wouldn't really make sense on machines with 28gb of RAM. In general CMS is found to work better for C*, leaving excess memory to be utilised by the OS page cache​

Tombstones and memtable_operations

2011-04-19 Thread Héctor Izquierdo Seliva
Hi everyone. I've configured in one of my column families memtable_operations = 0.02 and started deleting keys. I have already deleted 54k, but there hasn't been any flush of the memtable. Memory keeps pilling up and eventually nodes start to do stop-the-world GCs. Is this the way this i

Re: Concurrent Mark Sweep taking 12 seconds

2011-05-16 Thread Jonathan Ellis
Yes. 2011/5/16 Héctor Izquierdo Seliva : > Hi everyone. I see in the logs that Concurrent Mark Sweep is taking 12 > seconds to do its stuff. Is this normal? There is no stop-the-world GC, > it just takes 12 seconds. > > Configuration: 0.7.5 , 8GB Heap, 16GB machines. 7 *

CMS GC initial-mark taking 6 seconds , bad?

2011-09-24 Thread Yang
I see the following in my GC log 1910.513: [GC [1 CMS-initial-mark: 2598619K(26214400K)] 13749939K(49807360K), 6.0696680 secs] [Times: user=6.10 sys=0.00, real=6.07 secs] so there is a stop-the-world period of 6 seconds. does this sound bad ? or 6 seconds is OK and we should expect the built-in

Re: Issue with removing a node and adding it back

2015-03-30 Thread Robert Coli
laky network (AWS) or stop-the-world GC) and fix that OR 2) try tuning streaming_socket_timeout_in_ms =Rob

Re: Nodes frozen in GC

2011-03-07 Thread ruslan usifov
2011/3/8 Jonathan Ellis > It sounds like you're complaining that the JVM sometimes does > stop-the-world GC. > > You can mitigate this but not (for most workloads) eliminate it with > GC option tuning. That's simply the state of the art for Java garbage > collection

Reduce Cassandra GC

2013-04-16 Thread Joel Samuelsson
; messages. Every once in a while at one of these peaks, I get these stop-the-world GC for 6-7 minutes. Why does GC take up so much time even though the heap isn't full? I am aware that my access patterns make key caching very unlikely to be high. And indeed, my average key cache hit ratio du

Re: live data migration from mysql to cassandra

2011-01-14 Thread Edward Capriolo
On Fri, Jan 14, 2011 at 10:40 AM, ruslan usifov wrote: > Hello > > Dear community please share your experience, home you make live(without > stop) migration from mysql or other RDBM to cassandra > There is no built in way to do this. I remember hearing at hadoop world this year

Re: Cassandra GC Settings

2011-01-17 Thread Peter Schuller
> Now, a full stop of the application was what I was seeing extensively before > (100-200 times over the course of a major compaction as reported by > gossipers on other nodes). I have also just noticed that the previous > instability (ie application stops) correlated with the compact

One node misbehaving (lot's of GC), ideas?

2015-04-15 Thread Erik Forsberg
Hi! We having problems with one node (out of 56 in total) misbehaving. Symptoms are: * High number of full CMS old space collections during early morning when we're doing bulkloads. Yes, bulkloads, not CQL, and only a few thrift insertions. * Really long stop-the-world GC events (I've

Re: LCS and counters

2013-02-25 Thread Janne Jalkanen
At least for our use case (reading slices from varyingly sized rows from 10-100k composite columns with counters and hundreds of writes/second) LCS has a nice ~75% lower read latency than Size Tiered. And compactions don't stop the world anymore. Repairs do easily trigger a few hu

RE: Nodes frozen in GC

2011-03-10 Thread Gregory Szorc
andra is allocating upwards of 4GB/s. I once gave the JVM 30GB of heap and saw it run through the entire heap in a few seconds while doing a compaction! It would continuously blow through the heap, incur a stop-the-world collection, and repeat. Meanwhile, the listed compacted bytes from the JMX in

Re: Nodes frozen in GC

2011-03-07 Thread Chris Goffinet
Can you tell me how many SSTables on disk when you see GC pauses? In your 3 node cluster, what's the RF factor? On Mon, Mar 7, 2011 at 1:50 PM, ruslan usifov wrote: > > > 2011/3/8 Jonathan Ellis > > It sounds like you're complaining that the JVM sometimes does >&g

Re: RE: batch_mutate failed: out of sequence response

2011-04-07 Thread Héctor Izquierdo Seliva
't do > that. I'm not using thrift directly, and my application is single thread, so I guess this is Pelops fault somehow. Since I managed to tame memory comsuption the problem has not appeared again, but it always happened during a stop-the-world GC. Could it be that the message was instead of being dropped by the server?

Re: Get information about GC pause (Stop the world) via JMX, it's possible ?

2019-07-04 Thread Alain RODRIGUEZ
> garbage collector you use, they will be different but are there >> >> On Thu, 27 Jun 2019, 13:47 Ahmed Eljami, wrote: >> >>> Hi, >>> >>> I want to know if it's possible to get information about GC pause >>> duration (Stop the world)

Re: Cassandra out of Heap memory

2012-06-17 Thread rohit bhatia
I am using 1.0.5 . The logs suggest that it was one single instance of failure and I'm unable to reproduce it. >From the logs, In a span of 30 seconds, heap usage went from 4.8 gb to 8.8 gb With stop-the-world gc running 20 times. I believe that parNew was unable to clean up memory due

Re: How to add a node with zero downtime

2017-03-21 Thread daemeon reiydelle
Possible areas to check: - too few nodes (node overload) - you did not indicate either replication factor, number of nodes. Assume nodes are *rather* full. - network overload (check your TORS's errors, also the tcp stats on the relevant nodes) - look for stop the world garbage collecti

Re: CMS GC initial-mark taking 6 seconds , bad?

2011-09-25 Thread aaron morton
wrote: > I see the following in my GC log > > 1910.513: [GC [1 CMS-initial-mark: 2598619K(26214400K)] > 13749939K(49807360K), 6.0696680 secs] [Times: user=6.10 sys=0.00, > real=6.07 secs] > > so there is a stop-the-world period of 6 seconds. does this sound bad > ? or

Re: Cassandra out of Heap memory

2012-06-14 Thread rohit bhatia
young generation runs out of memory to migrate objects to the old generation (a so-called concurrent mode failure), leading to stop-the-world full garbage collection. However, with a slightly lower setting of the CMS threshold, we get a bit more headroom, and more stable overall performance.&qu

Re: RE: batch_mutate failed: out of sequence response

2011-04-07 Thread Héctor Izquierdo Seliva
x27;t do > that. > I'm not using thrift directly, and my application is single thread, so I guess this is Pelops fault somehow. Since I managed to tame memory comsuption the problem has not appeared again, but it always happened during a stop-the-world GC. Could it be that the message was

Re: Flush / Snapshot Triggering Full GCs, Leaving Ring

2011-04-06 Thread Jonathan Ellis
es w/o snapshot. Either way: "concurrent mode failure" is the easy GC problem. Hopefully you really are seeing mostly that -- this means the JVM didn't start CMS early enough, so it ran out of space before it could finish the concurrent collection, so it falls back to stop-the-world. The fi

Re: Get information about GC pause (Stop the world) via JMX, it's possible ?

2019-07-04 Thread Ahmed Eljami
>> (java.lang.type=GarbageCollector.name=G1 Old Generation.CollectionTime / >> java.lang.type=GarbageCollector.name=G1 Old Generation.CollectionCount) >> >> On Thu, Jun 27, 2019 at 11:56 AM Dimo Velev wrote: >> >>> That is s standard jvm metric. Connect to your cassa

Re: Upgrade to a different version?

2011-03-17 Thread Paul Pak
g like a scientific/data mining app? I ask because I'm wondering how you have managed to deal with the stop-the-world garbage collection issues that seems to hit most clusters that have significant load and cause application timeouts. Have you found that cassandra scales in read/write capaci

RE: Viewing Cassandra's Internal table Structure in a CQL world

2015-05-13 Thread Moshe Kranc
mail.com] Sent: Wednesday, May 13, 2015 10:40 PM To: user@cassandra.apache.org Subject: Re: Viewing Cassandra's Internal table Structure in a CQL world I think that you can still use cassandra-cli from 2.0.x to look into internal table structure. Of course you will see bytes instead of "r

Re: scylladb

2017-03-11 Thread Kant Kodali
@Dor 1) You guys have a CPU scheduler? you mean user level thread Scheduler that maps user level threads to kernel level threads? I thought C++ by default creates native kernel threads but sure nothing will stop someone to create a user level scheduling library if that's what you are ta

Re: Should one expect to see hints being stored/delivered occasionally?

2015-01-20 Thread Robert Coli
ng stored and delivered outside of this context is a warning sign that something may be wrong with your cluster. Probably what is happening is that you have stop the world GCs long enough to trigger queueing of hints via timeouts during these GCs. =Rob

Viewing Cassandra's Internal table Structure in a CQL world

2015-05-13 Thread Moshe Kranc
e can you tune your queries for best performance. To date, I have been using cassandra-cli to view the table's internal structure. But, I get bombarded with all kinds of warnings about how I should switch to CQL and stop using a deprecated product. My question: After the revolution (once

Re: Predictable low RW latency, SLABS and STW GC

2011-07-24 Thread aaron morton
Restarting the service will drop all the memmapped caches, cassandra caches are saved / persistent and you can also use memcachd if you want. Are you experiencing stop the world pauses? There are some things that can be done to reduce the chance of them happening. Cheers

RE: batch_mutate failed: out of sequence response

2011-04-05 Thread Héctor Izquierdo Seliva
I'm still running into problems. Now I don't write more than 100 columns at a time, and I'm having lots of Stop-the-world gc pauses. I'm writing into three column families, with memtable_operations = 0.3 and memtable_throughput = 64. Is any of this wrong? >

Re: Tombstones and memtable_operations

2011-04-19 Thread Héctor Izquierdo Seliva
rted deleting keys. I have already > deleted 54k, but there hasn't been any flush of the memtable. Memory > keeps pilling up and eventually nodes start to do stop-the-world GCs. Is > this the way this is supposed to work or have I done something wrong? > > Thanks! >

Re: One node misbehaving (lot's of GC), ideas?

2015-04-15 Thread Michal Michalski
erg wrote: > Hi! > > We having problems with one node (out of 56 in total) misbehaving. > Symptoms are: > > * High number of full CMS old space collections during early morning > when we're doing bulkloads. Yes, bulkloads, not CQL, and only a few > thrift insertions. &

TimedOutException caused by "Stop the world" activity

2012-05-27 Thread Jason Tang
when I have 1G memory 32 bit cassandra on standalone model, I didn't find so frequently "Stop the world" behavior. So I wonder what kind of operation will hang the cassandra system. How to collect information for tuning. >From the system log and document, I guess there are three

Flush / Snapshot Triggering Full GCs, Leaving Ring

2011-04-06 Thread C. Scott Andreas
cally, our logs suggest that calling "nodetool snapshot" on a node is triggering 12 to 16 second CMS GCs and a promotion failure resulting in a full stop-the-world collection, during which the node is marked dead by the ring until re-joining shortly after. Here's a log from one

Re: Follow-up post on cassandra configuration with some experiments on GC tuning

2010-08-29 Thread Carsten Krebs
the young generation, the more efficient the GC (if the > application behaves according to the weak generational hypothesis - > google it if you want a ref) because less data is promoted to old gen > and because the overhead of stop-the-world is lessened. > (3) The larger the young generati

Re: LCS and counters

2013-03-05 Thread Alain RODRIGUEZ
appreciated. 2013/2/25 Janne Jalkanen > > At least for our use case (reading slices from varyingly sized rows from > 10-100k composite columns with counters and hundreds of writes/second) LCS > has a nice ~75% lower read latency than Size Tiered. And compactions don't > stop the wor

Re: Nodes frozen in GC

2011-03-07 Thread Jonathan Ellis
It sounds like you're complaining that the JVM sometimes does stop-the-world GC. You can mitigate this but not (for most workloads) eliminate it with GC option tuning. That's simply the state of the art for Java garbage collection right now. On Sun, Mar 6, 2011 at 2:18 AM, ruslan usi

Re: Dazed and confused with Cassandra on EC2 ...

2010-10-08 Thread Jonathan Ellis
Xms=Xmx is so mlockall can tag the entire heap as "don't swap this out" on startup. Secondarily whenever the heap resizes upwards the JVM does a stop-the-world gc, but no, not really a big deal when your uptime is in days or weeks. -- Jonathan Ellis Project Chair, Apache Cassa

Re: CAS operation result is unknown - proposal accepted by 1 but not a quorum

2023-04-12 Thread Ralph Boehme
sync clocks or long stop-the-world GC pauses. hm, I'll check the logs, but I can reproduce this 100% on an idle test cluster just by running a simple test client that generates a smallish workload where just 2 processes on a single host hammer the Cassandra cluster with LWTs. nothing i

Re: CMS GC initial-mark taking 6 seconds , bad?

2011-09-25 Thread Peter Schuller
> I see the following in my GC log > > 1910.513: [GC [1 CMS-initial-mark: 2598619K(26214400K)] > 13749939K(49807360K), 6.0696680 secs] [Times: user=6.10 sys=0.00, > real=6.07 secs] > > so there is a stop-the-world period of 6 seconds. does this sound bad > ? or 6 seco

RE: Reduce Cassandra GC

2013-04-16 Thread Viktor Jevdokimov
ack down to 60% the next 5 minutes and so on. I get no "Heap is X full..." messages. Every once in a while at one of these peaks, I get these stop-the-world GC for 6-7 minutes. Why does GC take up so much time even though the heap isn't full? I am aware that my access patterns make

Re: Nodes frozen in GC

2011-03-07 Thread Paul Pak
Ellis wrote: > It sounds like you're complaining that the JVM sometimes does stop-the-world > GC. > > You can mitigate this but not (for most workloads) eliminate it with > GC option tuning. That's simply the state of the art for Java garbage > collection right now. &

Re: better anti OOM

2011-12-27 Thread Edward Capriolo
stop the world garbage collection. Also less free space usually means more memory fragmentation and causes your system to work harder CPU. it is counter intuitive to leave "free memory" because you want to get the large caches etc, but the overhead gives more stability which in the end gi

Re: Running multiple instances on a single server --micrandra ??

2010-12-14 Thread Gary Dusbabek
gt; 4) OPP would be "easier" to balance out hot spots (maybe not on this > one in not an OPP) > Sorry for chiming in so late, but another benefit is that it amortizes stop-the-world garbage collection across 6 jvms. > What does everyone thing? Does it ever make sense to run this w

Re: Nodes frozen in GC

2011-03-07 Thread Paul Pak
l...@gmail.com>> > > It sounds like you're complaining that the JVM sometimes does > stop-the-world GC. > > You can mitigate this but not (for most workloads) eliminate it with > GC option tuning. That's simply the state of the art for Java garbage >

RE: batch_mutate failed: out of sequence response

2011-04-05 Thread Héctor Izquierdo Seliva
Update with more info: I'm still running into problems. Now I don't write more than 100 columns at a time, and I'm having lots of Stop-the-world gc pauses. I'm writing into three column families, with memtable_operations = 0.3 and memtable_throughput = 64. There is now sw

Re: RE: batch_mutate failed: out of sequence response

2011-04-07 Thread Dan Washusen
t; guess this is Pelops fault somehow. Since I managed to tame memory > comsuption the problem has not appeared again, but it always happened > during a stop-the-world GC. Could it be that the message was sent > instead of being dropped by the server when the client assumed it had > timed out? >

Re: Nodes dropping out of cluster due to GC

2010-06-02 Thread Oleg Anastasjev
uring this failure concurrent GC completely stops java program (i.e. cassandra) and does a GC cycle. Other cassandra nodes discover, that node is not responding and considering it dead. If concurrent GC is properly tuned, it should never do stop-the-world and GC ( thats why it is called concurrent

Re: Reduce Cassandra GC

2013-04-16 Thread Joel Samuelsson
6 used; max is > 1046937600 > > ** ** > > However, the heap is not full. The heap usage has a jagged pattern going > from 60% up to 70% during 5 minutes and then back down to 60% the next 5 > minutes and so on. I get no "Heap is X full..." messages. Every once i

Re: scylladb

2017-03-11 Thread Avi Kivity
a CPU scheduler? you mean user level thread Scheduler that maps user level threads to kernel level threads? I thought C++ by default creates native kernel threads but sure nothing will stop someone to create a user level scheduling library if that's what you are talking about? 2) How can

Re: CMS GC initial-mark taking 6 seconds , bad?

2011-10-20 Thread Maxim Potekhin
] so there is a stop-the-world period of 6 seconds. does this sound bad ? or 6 seconds is OK and we should expect the built-in fault-tolerance of Cassandra handle this? Thanks Yang

Re: live data migration from mysql to cassandra

2011-01-14 Thread Victor Kabdebon
Cassandra and their hold a lot of data. Best Regards, Victor K. http://www.voxnucleus.fr 2011/1/14 Edward Capriolo > On Fri, Jan 14, 2011 at 10:40 AM, ruslan usifov > wrote: > > Hello > > > > Dear community please share your experience, home you make live(without >

Re: batch_mutate failed: out of sequence response

2011-04-05 Thread Jonathan Ellis
Step 1: disable swap. 2011/4/5 Héctor Izquierdo Seliva : > Update with more info: > > I'm still running into problems. Now I don't write more than 100 columns > at a time, and I'm having lots of Stop-the-world gc pauses. > > I'm writing into three column fa

Re: CMS GC initial-mark taking 6 seconds , bad?

2011-09-25 Thread Yang
;> I see the following in my GC log >> >> 1910.513: [GC [1 CMS-initial-mark: 2598619K(26214400K)] >> 13749939K(49807360K), 6.0696680 secs] [Times: user=6.10 sys=0.00, >> real=6.07 secs] >> >> so there is a stop-the-world period of 6 seconds. does this sound ba

Re: TimedOutException caused by "Stop the world" activity

2012-05-30 Thread aaron morton
xception, and some > operation failed, and all traffic hang for a while. > > And when I have 1G memory 32 bit cassandra on standalone model, I didn't find > so frequently "Stop the world" behavior. > > So I wonder what kind of operation will hang the cassandra

Re: Viewing Cassandra's Internal table Structure in a CQL world

2015-05-13 Thread Jonathan Haddad
doanduy...@gmail.com] > Sent: Wednesday, May 13, 2015 10:40 PM > To: user@cassandra.apache.org > Subject: Re: Viewing Cassandra's Internal table Structure in a CQL world > > > > I think that you can still use cassandra-cli from 2.0.x to look into > internal table str

Re: Jmx_exporter CPU spike

2018-07-09 Thread Ben Bromhead
Hi Rajpal I'd invite you to have a look at https://github.com/zegelin/cassandra-exporter Significantly faster (bypasses JMX rpc stuff, 10ms to collect metrics for 300 tables vs 2-3 seconds via JMX), plus the naming/tagging fits far better into the Prometheus world. Still missing a few stats

Re: Viewing Cassandra's Internal table Structure in a CQL world

2015-05-13 Thread DuyHai Doan
mn names > actually look like. Only by keeping an eye on the physical structure can > you tune your queries for best performance. > > > > To date, I have been using cassandra-cli to view the table's internal > structure. But, I get bombarded with all kinds of warnings about

Weird GC

2014-01-29 Thread Joel Samuelsson
Hi, We've been trying to figure out why we have so long and frequent stop-the-world GC even though we have basically no load. Today we got a log of a weird GC that I wonder if you have any theories of why it might have happened. A plot of our heap at the time, paired with the GC time fro

Re: row_cache_provider = 'SerializingCacheProvider'

2012-06-04 Thread ruslan usifov
nd row_cache_provider='SerializingCacheProvider'; When i setup row cache i got promotion failure in GC (with stop the world pause about 30secs) with almost HEAP filled. I very confused with this behavior. PS: i use cassandra 1.0.10, with JNA 3.4.0 on ubuntu lucid (kernel 2.6.32-41) 2

Re: live data migration from mysql to cassandra

2011-01-14 Thread Victor Kabdebon
n 14, 2011 at 10:40 AM, ruslan usifov >> wrote: >> > Hello >> > >> > Dear community please share your experience, home you make live(without >> > stop) migration from mysql or other RDBM to cassandra >> > >> >> There is no built in way

Re: Tombstones and memtable_operations

2011-04-19 Thread aaron morton
one of my column families >> memtable_operations = 0.02 and started deleting keys. I have already >> deleted 54k, but there hasn't been any flush of the memtable. Memory >> keeps pilling up and eventually nodes start to do stop-the-world GCs. Is >> this the way this is supposed to work or have I done something wrong? >> >> Thanks! >> > >

Cassandra crashes....

2017-08-22 Thread Thakrar, Jayesh
TWCS compaction properties to have min/max compaction sstables = 4 and by drastically reducing the size of the New/Eden space (to 5% of heap space = 800 MB). Its been about 12 hours and our stop-the-world gc pauses are under 90 ms. Since the servers have more than sufficient resources, we are not

Re: row_cache_provider = 'SerializingCacheProvider'

2012-06-04 Thread ruslan usifov
> 3G > > > Based on "nodetool -h localhost cfhistograms" i calc avg row size > > 70KB > > I setup row cache only for one CF with follow settings: > > update column family building with rows_cached=1 and > row_cache_provider='SerializingCacheProvider&

Re: Flush / Snapshot Triggering Full GCs, Leaving Ring

2011-04-07 Thread ruslan usifov
. > > Confirmation possibility #2: Force some flushes w/o snapshot. > > Either way: "concurrent mode failure" is the easy GC problem. > Hopefully you really are seeing mostly that -- this means the JVM > didn't start CMS early enough, so it ran out of space before it

Re: CAS operation result is unknown - proposal accepted by 1 but not a quorum

2023-04-11 Thread Ralph Boehme
On 4/11/23 19:53, Bowen Song via user wrote: That error message sounds like one of the nodes timed out in the paxos propose stage.  You can check the system.log and gc.log and see if you can find anything unusual in them, such as network errors, out of sync clocks or long stop-the-world GC

Re: Follow-up post on cassandra configuration with some experiments on GC tuning

2010-08-29 Thread Edward Capriolo
the more significant the saw-tooth. > (2) The larger the young generation, the more efficient the GC (if the > application behaves according to the weak generational hypothesis - > google it if you want a ref) because less data is promoted to old gen > and because the overhead of stop

Re: Jmx_exporter CPU spike

2018-07-09 Thread rajpal reddy
aster (bypasses JMX rpc stuff, 10ms to collect metrics for 300 > tables vs 2-3 seconds via JMX), plus the naming/tagging fits far better into > the Prometheus world. Still missing a few stats like GC etc, but feel free to > submit a PR! > > Ben > > > > On Mon, Jul 9, 2

Re: Jmx_exporter CPU spike

2018-07-10 Thread Rahul Singh
al > > > > I'd invite you to have a look at  > > https://github.com/zegelin/cassandra-exporter > > > > Significantly faster (bypasses JMX rpc stuff, 10ms to collect metrics for > > 300 tables vs 2-3 seconds via JMX), plus the naming/tagging fits far better &g

Linux containers, docker, SSD, and RAID.

2014-06-04 Thread Kevin Burton
Hey guys. Question about using container with Cassandra. I think we will eventually deploy on containers… lxc with docker probably. Our first config will have one cassandra daemon per box. Of course there are issues here. Larger per VM heap means more GC time and potential stop the world and

Re: Garbage collection freezes cassandra node

2011-12-19 Thread Peter Schuller
xpected and is not in and of itself indicative of a stop-the-world 10 second pause. It is fully expected using the CMS collector that you'll have a sawtooth pattern as young gen is being collected, and then a sudden drop as CMS does its job concurrently without pausing the application for a lo

Re: RE: batch_mutate failed: out of sequence response

2011-04-18 Thread Dan Washusen
gt; > > > my money is on using a single connection from multiple threads. don't do > > > that. > > > > I'm not using thrift directly, and my application is single thread, so I > > guess this is Pelops fault somehow. Since I managed to tame memory >

Re: Dazed and confused with Cassandra on EC2 ...

2010-10-09 Thread Peter Schuller
> The main reason to set Xms=Xmx is so mlockall can tag the entire heap > as "don't swap this out" on startup.  Secondarily whenever the heap > resizes upwards the JVM does a stop-the-world gc, but no, not really a > big deal when your uptime is in days or weeks. I'

Re: Live upgrade 2.0 to 2.1 temporarily increases GC time causing timeouts and unavailability

2016-02-19 Thread daemeon reiydelle
timeouts. You may not be seeing a 2.0 vs. 2.1 issue, rather a 2.1 issue proper. While others did not find this associated with stop-the-world GC, I saw some evidence of same (using Cassandra stress, but I recently reproduce the issue with YCSB!) *...* *Daemeon C.M. ReiydelleUSA (+1

Re: scylladb

2017-03-11 Thread Kant Kodali
gt; transparent hugepages). > > > On 03/11/2017 10:26 PM, Kant Kodali wrote: > > @Dor > > 1) You guys have a CPU scheduler? you mean user level thread Scheduler > that maps user level threads to kernel level threads? I thought C++ by > default creates native kernel threads b

Re: Cassandra crashes....

2017-08-22 Thread Jeff Jirsa
the TWCS > compaction properties > to have min/max compaction sstables = 4 and by drastically reducing the size > of the New/Eden space (to 5% of heap space = 800 MB). > Its been about 12 hours and our stop-the-world gc pauses are under 90 ms. > Since the servers have more than suff

Re: Cassandra crashes....

2017-08-22 Thread Fay Hou [Storage Service] ­
space (to 5% of heap space = 800 MB). Its been about 12 hours and our stop-the-world gc pauses are under 90 ms. Since the servers have more than sufficient resources, we are not seeing any noticeable performance impact. Is this kind of tuning normal/expected? Thanks, Jayesh

Re: What % of cassandra developers are employed by Datastax?

2014-05-23 Thread Redmumba
Another thing to keep in mind--even core pieces like the Linux kernel are dominated by corporations. Less than 20% of contributions last year were made by non-corporate sponsored contributors. Obviously, this is a bit different, but many parts of the open source world depend on upstream

Re: Upgrade to a different version?

2011-03-17 Thread Thibaut Britz
; Thanks Thibaut, believe it or not, it does. :) > > Is your use case a typical web app or something like a scientific/data > mining app? I ask because I'm wondering how you have managed to deal > with the stop-the-world garbage collection issues that seems to hit most > clust

Disable Swap? batch_mutate failed: out of sequence response

2011-04-05 Thread Jonathan Colby
t write more than 100 columns >> at a time, and I'm having lots of Stop-the-world gc pauses. >> >> I'm writing into three column families, with memtable_operations = 0.3 >> and memtable_throughput = 64. There is now swapping, and full GCs are taking >> a

Re: Tombstones and memtable_operations

2011-04-19 Thread Héctor Izquierdo Seliva
las 17:36 +0200, Héctor Izquierdo Seliva escribió: > >> Hi everyone. I've configured in one of my column families > >> memtable_operations = 0.02 and started deleting keys. I have already > >> deleted 54k, but there hasn't been any flush of the memtable. Memory &

  1   2   3   4   >