Re: scylladb

2017-03-11 Thread Avi Kivity
scheduler? you mean user level thread Scheduler that maps user level threads to kernel level threads? I thought C++ by default creates native kernel threads but sure nothing will stop someone to create a user level scheduling library if that's what you are talking about? 2) How can one create

Cassandra crashes....

2017-08-22 Thread Thakrar, Jayesh
the TWCS compaction properties to have min/max compaction sstables = 4 and by drastically reducing the size of the New/Eden space (to 5% of heap space = 800 MB). Its been about 12 hours and our stop-the-world gc pauses are under 90 ms. Since the servers have more than sufficient resources, we

Re: Cassandra crashes....

2017-08-22 Thread Surbhi Gupta
me back-to-back long GCs (1.5 to 3.5 > seconds). > > > Note that we had set the target for GC pauses to be 200 ms > > > > > > We have been somewhat able to tame the crashes by updating the TWCS > compaction properties > > > > to have min/max c

Re: Cassandra crashes....

2017-08-22 Thread Fay Hou [Storage Service] ­
tically reducing the size of the New/Eden space (to 5% of heap space = 800 MB). Its been about 12 hours and our stop-the-world gc pauses are under 90 ms. Since the servers have more than sufficient resources, we are not seeing any noticeable performance impact. Is this kind of tuning normal/exp

Re: Cassandra crashes....

2017-08-22 Thread Jeff Jirsa
rashes by updating the TWCS > compaction properties > to have min/max compaction sstables = 4 and by drastically reducing the size > of the New/Eden space (to 5% of heap space = 800 MB). > Its been about 12 hours and our stop-the-world gc pauses are under 90 ms. > Sinc

Re: Jmx_exporter CPU spike

2018-07-10 Thread Rahul Singh
al > > > > I'd invite you to have a look at  > > https://github.com/zegelin/cassandra-exporter > > > > Significantly faster (bypasses JMX rpc stuff, 10ms to collect metrics for > > 300 tables vs 2-3 seconds via JMX), plus the naming/tagging fits far better > &g

Re: Jmx_exporter CPU spike

2018-07-09 Thread rajpal reddy
ter (bypasses JMX rpc stuff, 10ms to collect metrics for 300 > tables vs 2-3 seconds via JMX), plus the naming/tagging fits far better into > the Prometheus world. Still missing a few stats like GC etc, but feel free to > submit a PR! > > Ben > > > > On Mon, Jul 9, 2018 a

Re: Jmx_exporter CPU spike

2018-07-09 Thread Ben Bromhead
Hi Rajpal I'd invite you to have a look at https://github.com/zegelin/cassandra-exporter Significantly faster (bypasses JMX rpc stuff, 10ms to collect metrics for 300 tables vs 2-3 seconds via JMX), plus the naming/tagging fits far better into the Prometheus world. Still missing a few stats like

Re: gc throughput

2021-11-17 Thread Elliott Sims
CMS has a higher risk of a long stop-the-world full GC that will cause a burst of timeouts, but if you're not getting that or don't mind if it happens now and then CMS is probably the way to go. It's generally lower-overhead than G1. If you really don't care about latency it might even be worth

Re: gc throughput

2021-11-17 Thread Kiran mk
G1GC would be the most suitable option and it has better control over pauses and optimal. Best Regards, Kiran M K On Wed, Nov 17, 2021, 10:27 PM Elliott Sims wrote: > CMS has a higher risk of a long stop-the-world full GC that will cause a > burst of timeouts, but if you're not g

Re: CAS operation result is unknown - proposal accepted by 1 but not a quorum

2023-04-12 Thread Jeff Jirsa
n >>> find anything unusual in them, such as network errors, out of sync clocks >>> or long stop-the-world GC pauses. >> hm, I'll check the logs, but I can reproduce this 100% on an idle test >> cluster just by running a simple test client that generates a smallish >&g

Re: CAS operation result is unknown - proposal accepted by 1 but not a quorum

2023-04-11 Thread Ralph Boehme
On 4/11/23 19:53, Bowen Song via user wrote: That error message sounds like one of the nodes timed out in the paxos propose stage.  You can check the system.log and gc.log and see if you can find anything unusual in them, such as network errors, out of sync clocks or long stop-the-world GC

Re: scylladb

2017-03-11 Thread Avi Kivity
and control. None of this is new btw, it's pretty common in the storage world. Avi On 03/11/2017 11:18 PM, Kant Kodali wrote: Here is the Java version http://docs.paralleluniverse.co/quasar/ but I still don't see how user level scheduling can be beneficial (This is a well debated problem)? How can

RE: Reduce Cassandra GC

2013-04-16 Thread Viktor Jevdokimov
is 1046937600 However, the heap is not full. The heap usage has a jagged pattern going from 60% up to 70% during 5 minutes and then back down to 60% the next 5 minutes and so on. I get no Heap is X full... messages. Every once in a while at one of these peaks, I get these stop-the-world GC for 6-7

Re: TimedOutException caused by Stop the world activity

2012-05-30 Thread aaron morton
operation failed, and all traffic hang for a while. And when I have 1G memory 32 bit cassandra on standalone model, I didn't find so frequently Stop the world behavior. So I wonder what kind of operation will hang the cassandra system. How to collect information for tuning. From the system

Re: LCS and counters

2013-03-05 Thread Hiller, Dean
(reading slices from varyingly sized rows from 10-100k composite columns with counters and hundreds of writes/second) LCS has a nice ~75% lower read latency than Size Tiered. And compactions don't stop the world anymore. Repairs do easily trigger a few hundred compactions though, but it's

Re: Cassandra stress test and max vs. average read/write latency.

2011-12-22 Thread Peter Schuller
. The other thing is to make sure CMS is not failing (promotion failure/concurrent mode failure) and falling back to a stop-the-world serial compacting GC of the entire heap. You might also use -:XX+PrintApplicationPauseTime (I think, I am probably not spelling it entirely correctly) to get a more

Finding bottleneck of a cluster

2012-07-05 Thread rohit bhatia
every second. So cassandra spends 10% of its time stuck in Stop the world collector. The os load is around 16-20 and the average write latency is 3ms. tpstats do not show any significant pending tasks. At this point suddenly, Several nodes start dropping several Mutation messages. There are also

Re: Node doesn't rejoin ring after restart

2012-08-11 Thread Tyler Hobbs
in the docs. I'm working on the assumption that a small outage on one node shouldn't cause extraordinary action. Nor do I want to have to stop every node before bringing them up one by one. What am I missing? Am I forced into those time consuming methods every time I want to restart? Thoughts

Re: Nodes frozen in GC

2011-03-15 Thread Peter Schuller
*period* while the young generation is being collected as that is a stop-the-world pause. (This is also why I said before that at 10 gig new-gen size, the observed behavior on young gen collections may be similar to fallback-to-full-gc cases, but not quite since it would be parallel rather than serial

Re: Upgrade to a different version?

2011-03-16 Thread Paul Pak
adequate testing. Hopefully, at some point soon, it will get better and doing a data import job won't take a cassandra cluster to it's knees or we won't experience stop the world GC issues and have out of memory errors from routine usage. Paul On 3/16/2011 2:13 PM, Paul Pak wrote: Hi Jake, I'm

Re: Upgrade to a different version?

2011-03-16 Thread Joshua Partogi
features, but I feel we have taken a step back in relability and scalability since so many features were added without adequate testing.  Hopefully, at some point soon, it will get better and doing a data import job won't take a cassandra cluster to it's knees or we won't experience stop the world

Re: Upgrade to a different version?

2011-03-17 Thread Dan Kuebrich
. :) Is your use case a typical web app or something like a scientific/data mining app? I ask because I'm wondering how you have managed to deal with the stop-the-world garbage collection issues that seems to hit most clusters that have significant load and cause application timeouts. Have you found

Re: RE: batch_mutate failed: out of sequence response

2011-04-18 Thread Dan Washusen
the problem has not appeared again, but it always happened during a stop-the-world GC. Could it be that the message was sent instead of being dropped by the server when the client assumed it had timed out? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax

Re: RE: batch_mutate failed: out of sequence response

2011-04-18 Thread Héctor Izquierdo Seliva
has not appeared again, but it always happened during a stop-the-world GC. Could it be that the message was sent instead of being dropped by the server when the client assumed it had timed out? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder

Re: Tombstones and memtable_operations

2011-04-19 Thread aaron morton
, but there hasn't been any flush of the memtable. Memory keeps pilling up and eventually nodes start to do stop-the-world GCs. Is this the way this is supposed to work or have I done something wrong? Thanks!

Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?

2010-08-22 Thread Moleza Moleza
observed by someone else before? [3] The node being down means that nodetool, and any other client, wont be able to connect to it (clients should use other nodes in cluster to write data). Correct? [4] Is GC ConcurrentMarkSweep a Stop-The-World situation? Where the JVM cannot do anything else? Hence

Rebuilding a cassandra seed node with the same tokens and same IP address

2014-08-29 Thread Donald Smith
the disks on all the nodes and restart. Now we want to be cleverer.) To overcome the issue we figure we should just rebuild the node using the same token range, to avoid unneeded data reshuffling. So we figure we should (1) find the tokens in use on that node via nodetool ring, (2) stop cassandra

Re: Wide rows best practices and GC impact

2014-12-03 Thread Gianluca Borello
SSTables. =Rob http://twitter.com/rcolidba PS - Given 30GB of RAM on the machine, you could consider investigating large-heap configurations, rbranson from Instagram has some slides out there on the topic. What you pay is longer stop the world GCs, IOW latency if you happen to be talking

Re: Cluster status instability

2015-04-02 Thread Michal Michalski
of these be a problem in your case? I'd start from investigating GC logs e.g. to see how long does the stop the world full GC take (GC logs should be on by default from what I can see [1]) [1] https://issues.apache.org/jira/browse/CASSANDRA-5319 Michał Kind regards, Michał Michalski, michal.michal...@boxever.com

Re: Cluster status instability

2015-04-02 Thread Jan
logs e.g. to see how long does the stop the world full GC take (GC logs should be on by default from what I can see [1]) [1] https://issues.apache.org/jira/browse/CASSANDRA-5319 Michał Kind regards,Michał Michalski,michal.michal...@boxever.com On 2 April 2015 at 11:05, Marcin Pietraszek mpietras

Re: Too many keyspaces causes cql connection to time out ?

2016-05-24 Thread Justin Lin
ra in most case. (Sometimes it still can connect >> to cassandra). And from cassandra log, we can see it takes roughly 3 >> seconds to do gc when there is an incoming connection. And the gc is the >> only difference between the timeout connection and the successful >> connect

Re: Too many keyspaces causes cql connection to time out ?

2016-05-24 Thread Eric Stevens
most case. (Sometimes it still can connect to > cassandra). And from cassandra log, we can see it takes roughly 3 seconds > to do gc when there is an incoming connection. And the gc is the only > difference between the timeout connection and the successful connection. So > we suspect this Stop-The-Wo

My cluster shows high system load without any apparent reason

2016-07-20 Thread Juho Mäkinen
are running on high cpu, but that alone should not cause a load of 20-30. - Doesn't seem to be GC load: A system starts to show symptoms so that it has ran only one CMS sweep. Not like it would do constant stop-the-world gc's. - top shows that the C* processes use 100G of RSS memory. I assume

Answering the question of Cassandra Summit 2017

2017-08-11 Thread Patrick McFadin
for years, I can say it’s no small effort. There it is. Fire away with your questions, comments. All I ask is keep it respectful because this is a community of amazing people. You have changed the world over these years and I know it won’t stop. You know I got a hug for you wherever we just happen

Re: Dropped Mutations

2018-04-18 Thread hitesh dua
Hi , I'll recommend tuning you heap size further( preferably lower) as large Heap size can lead to Large Garbage collection pauses also known as also known as a stop-the-world event. A pause occurs when a region of memory is full and the JVM needs to make space to continue. During a pause all

Re: No node was available to execute query error

2021-03-12 Thread Bowen Song
Millions rows in a single query? That sounds like a bad idea to me. Your "NoNodeAvailableException" could be caused by stop-the-world GC pauses, and the GC pauses are likely caused by the query itself. On 12/03/2021 13:39, Joe Obernberger wrote: Thank you Paul and Erick.  Th

Re: Reduce Cassandra GC

2013-06-07 Thread Joel Samuelsson
I keep having issues with GC. Besides the cluster mentioned above, we also have a single node development cluster having the same issues. This node has 12.33 GB data, a couple of million skinny rows and basically no load. It has default memory settings but keep getting very long stop-the-world GC

Re: Changing the CLI, not a great idea!

2011-07-28 Thread Edward Capriolo
of oral history, that is hard to keep up with, and even harder to understand once it's long past. I think it is clear that we need a better one-stop-shop for good documentation. What hasn't been talked about much - but I think it's just as important - is a good one-stop-shop

Re: Follow-up post on cassandra configuration with some experiments on GC tuning

2010-08-30 Thread Peter Schuller
experiences with this? If you use the default (for the JVM, not for cassandra) throughput collector, you *will* take full stop-the-world collections, period. You can enable parallel GC, but with that collector there's no way around the fact that full collections will pause the application for the full

Re: Cassandra GC Settings

2011-01-17 Thread Dan Hendry
). My theory is that each row being compacted with the old settings was being promoted to the old generation, thereby running the heap out of space and causing a stop the world gc. With the new settings, rows being compacted typically remain in the young generation, allowing them to be cleaned up more

Re: Upgrading from 0.6 to 0.7.0

2011-01-19 Thread Anthony Molinaro
every one of them accesses multiple keyspaces. So selecting one means your pooling of connections has to be more complex. But I'm sure there was some reason. Anyway, as far as I know there is no way to do a rolling upgrade from 0.6 to 0.7, its a stop the world upgrade. This may be mitigated

Re: Upgrading from 0.6 to 0.7.0

2011-01-19 Thread Aaron Morton
there was some reason. Anyway, as far as I know there is no way to do a rolling upgrade from 0.6 to 0.7, its a stop the world upgrade. This may be mitigated by the use of a client library which might hide some of the thrift calls, but if you have your own thrift client, you sort of have to shutdown everything

Re: scylladb

2017-03-11 Thread Kant Kodali
t; of the application, allows you to both saturate the CPU and maintain low > latency. > F*or my workload and probably others I had seen Cassandra was always been CPU bound.* > > There are other factors, like NUMA-friendliness, but in the end it all > boils down to efficiency

Re: Optimal Heap Size Cassandra Configuration

2019-05-21 Thread Alain RODRIGUEZ
you the GC throughput (% of time JVM is available - not doing a 'stop the world' pause). To give you some numbers, this should be > 95-98% minimum. If you are having a lower throughput, chances are hight that you can 'easily' improve performances there. There is a lot more details in this analy

Re: Reduce Cassandra GC

2013-06-07 Thread Igor
long stop-the-world GC pauses: INFO [ScheduledTasks:1] 2013-06-07 10:37:02,537 GCInspector.java (line 122) GC for ParNew: 99342 ms for 1 collections, 1400754488 used; max is 4114612224 To try to rule out amount of memory, I set it to 16GB (we're on a virtual environment), with 4GB

Re: Cassandra stress test and max vs. average read/write latency.

2011-12-22 Thread Peter Fales
to full compacting GC:s (stop-the-world) for the old generation. I would start by adjusting young gen so that your frequent pauses are at an acceptable level, and then see whether or not you can sustain that in terms of old-gen. Start with this in any case: Run Cassandra with -XX:+PrintGC

Re: Cassandra stress test and max vs. average read/write latency.

2011-12-23 Thread Peter Fales
) - IF your workload is such that you are suffering from fragmentation and eventually seeing Cassandra fall back to full compacting GC:s (stop-the-world) for the old generation. I would start by adjusting young gen so that your frequent pauses are at an acceptable level, and then see whether

Re: Tuning cassandra (compactions overall)

2012-05-16 Thread aaron morton
What is the the benefit of having more memory ? I mean, I don't understand why having 1, 2, 4, 8 or 16 GB of memory is so different. Less frequent and less aggressive garbage collection frees up CPU resources to run the database. Less memory results in frequent and aggressive (i.e. stop

Re: All host pools Marked Down

2012-05-30 Thread aaron morton
I would remove the load balancer from the equation. Compactions do not stop the world, they may degrade performance for a while but thats about it. Look in the logs on the servers, are the nodes logging that other nodes are going DOWN ? Cheers - Aaron Morton Freelance

Re: Finding bottleneck of a cluster

2012-07-05 Thread rohit bhatia
every second. So cassandra spends 10% of its time stuck in Stop the world collector. The os load is around 16-20 and the average write latency is 3ms. tpstats do not show any significant pending tasks. At this point suddenly, Several nodes start dropping several Mutation messages

Re: Read Latency Degradation

2010-12-18 Thread Peter Schuller
dropping the heap down to 8gb, but having pained through many cmf in the past I thought the larger heap should help prevent the stop the world gc. I'm not sure what got merged to 0.6.8, but you may way want to grab the JVM options from the 0.7 branch. In particular, the initial occuprancy

RE: frequent client exceptions on 0.7.0

2011-02-17 Thread Dan Hendry
problem. Stop the world GC is a real pain. In my cluster I was, and still am to some extent, seeing each node go 'down' about 10-30 times a day and up to a few hundred when running major compactions (by greping through the Cassandra system log). GC tuning is an art into itself but if this is your

Re: frequent client exceptions on 0.7.0

2011-02-17 Thread Aaron Morton
. Anything over about a few seconds may be causing your problem. Stop the world GC is a real pain. In my cluster I was, and still am to some extent, seeing each node go 'down' about 10-30 times a day and up to a few hundred when running major compactions (by greping through the Cassandra system log

Re: Upgrade to a different version?

2011-03-16 Thread Jeremy Hanna
in relability and scalability since so many features were added without adequate testing. Hopefully, at some point soon, it will get better and doing a data import job won't take a cassandra cluster to it's knees or we won't experience stop the world GC issues and have out of memory errors from routine

Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?

2010-08-22 Thread Jonathan Ellis
in cluster to write data). Correct? [4] Is GC ConcurrentMarkSweep a Stop-The-World situation? Where the JVM cannot do anything else? Hence then node is technically Down? Correct? [5] Why is this GC taking such a long time? (see JMV ARGS posted bellow). [6] Any JMV Args (switches) I can use

Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?

2010-08-22 Thread Benjamin Black
to write data). Correct? [4] Is GC ConcurrentMarkSweep a Stop-The-World situation? Where the JVM cannot do anything else? Hence then node is technically Down? Correct? [5] Why is this GC taking such a long time? (see JMV ARGS posted bellow). [6] Any JMV Args (switches) I can use to prevent

Re: Weird GC

2014-01-31 Thread Benedict Elliott Smith
-XX:PrintSafepointStatisticsCount=1 On 29 January 2014 16:23, Joel Samuelsson samuelsson.j...@gmail.comwrote: Hi, We've been trying to figure out why we have so long and frequent stop-the-world GC even though we have basically no load. Today we got a log of a weird GC that I wonder if you have

Re: Wide rows best practices and GC impact

2014-12-04 Thread Jabbar Azam
investigating large-heap configurations, rbranson from Instagram has some slides out there on the topic. What you pay is longer stop the world GCs, IOW latency if you happen to be talking to a replica node when it pauses.

Re: Cluster status instability

2015-04-02 Thread daemeon reiydelle
? I'd start from investigating GC logs e.g. to see how long does the stop the world full GC take (GC logs should be on by default from what I can see [1]) [1] https://issues.apache.org/jira/browse/CASSANDRA-5319 Michał Kind regards, Michał Michalski, michal.michal...@boxever.com On 2

Re: Too many keyspaces causes cql connection to time out ?

2016-05-26 Thread Eric Stevens
dra). And from cassandra log, we can see it takes roughly 3 >>> seconds to do gc when there is an incoming connection. And the gc is the >>> only difference between the timeout connection and the successful >>> connection. So we suspect this Stop-The-World GC might block th

Re: Blog post on Cassandra's inner workings and performance - feedback welcome

2016-07-09 Thread daemeon reiydelle
: - (1) why networks need to be clean (the impact of "dirty"/erratic networks); - (2) the impact of java (off heap, stop the world garbage collection, why more memory makes things worse; - (3) table design decisions (read mostly, write mostly, mixed read/write, etc.) A re

Re: Blog post on Cassandra's inner workings and performance - feedback welcome

2016-07-10 Thread Graham Sanderson
> If there was sections I would like to see your clear thoughts appear within, > it would be around: > (1) why networks need to be clean (the impact of "dirty"/erratic networks); > (2) the impact of java (off heap, stop the world garbage collection, why more > memory

Re: My cluster shows high system load without any apparent reason

2016-07-22 Thread Juho Mäkinen
ning on high cpu, but that > alone should not cause a load of 20-30. > - Doesn't seem to be GC load: A system starts to show symptoms so that it > has ran only one CMS sweep. Not like it would do constant stop-the-world > gc's. > - top shows that the C* processes use 100G of RSS memory

Re: Answering the question of Cassandra Summit 2017

2017-08-11 Thread Ben Bromhead
say it’s no small effort. > > There it is. Fire away with your questions, comments. All I ask is keep it > respectful because this is a community of amazing people. You have changed > the world over these years and I know it won’t stop. You know I got a hug > for you wherever we ju

Re: Answering the question of Cassandra Summit 2017

2017-08-12 Thread Jeff Jirsa
d > if so, I’ll see you there. After being involved in making the Cassandra > Summit happen for years, I can say it’s no small effort. > > There it is. Fire away with your questions, comments. All I ask is keep it > respectful because this is a community of amazing people. You have changed > the world over these years and I know it won’t stop. You know I got a hug > for you wherever we just happen to meet. > > Patrick > >

Re: Cassandra crashes....

2017-08-22 Thread Fay Hou [Storage Service] ­
d by drastically reducing the size of the New/Eden space (to 5% of heap space = 800 MB). Its been about 12 hours and our stop-the-world gc pauses are under 90 ms. Since the servers have more than sufficient resources, we are not seeing any noticeable performance impact. Is this kind of tuning normal/expected? Thanks, Jayesh

Re: Dropped Mutations

2018-04-19 Thread Shalom Sagges
send.com/view/8iiswfp> On Thu, Apr 19, 2018 at 12:42 AM, hitesh dua <hiteshd...@gmail.com> wrote: > Hi , > > I'll recommend tuning you heap size further( preferably lower) as large > Heap size can lead to Large Garbage collection pauses also known as also > known a

Re: No node was available to execute query error

2021-03-12 Thread Bowen Song
'd run map reduce jobs. Thank you. -Joe On 3/12/2021 9:07 AM, Bowen Song wrote: Millions rows in a single query? That sounds like a bad idea to me. Your "NoNodeAvailableException" could be caused by stop-the-world GC pauses, and the GC pauses are likely caused by the query itself.

Re: No node was available to execute query error

2021-03-12 Thread Joe Obernberger
On 3/12/2021 9:07 AM, Bowen Song wrote: Millions rows in a single query? That sounds like a bad idea to me. Your "NoNodeAvailableException" could be caused by stop-the-world GC pauses, and the GC pauses are likely caused by the query itself. On 12/03/2021 13:39, Joe Obernberger wrot

Re: underutilized servers

2021-03-05 Thread Bowen Song
Based on my personal experience, the combination of slow read queries and low CPU usage is often an indicator of bad table schema design (e.g.: large partitions) or bad query (e.g. without partition key). Check the Cassandra logs first, is there any long stop-the-world GC? tombstone warning

Re: gc throughput

2021-11-17 Thread C. Scott Andreas
without delay.– ScottOn Nov 17, 2021, at 9:15 AM, Kiran mk wrote:G1GC would be the most suitable option and it has better control over pauses and optimal.Best Regards,Kiran M KOn Wed, Nov 17, 2021, 10:27 PM Elliott Sims wrote:CMS has a higher risk of a long stop-the-world full GC that will cause

Re: CAS operation result is unknown - proposal accepted by 1 but not a quorum

2023-04-11 Thread Bowen Song via user
That error message sounds like one of the nodes timed out in the paxos propose stage.  You can check the system.log and gc.log and see if you can find anything unusual in them, such as network errors, out of sync clocks or long stop-the-world GC pauses. BTW, since you said you want

Re: Query regarding SSTable timestamps and counts

2012-11-20 Thread aaron morton
upgradetables re-writes every sstable to have the same contents in the newest format. Agree. In the world of compaction, and excluding upgrades, have older sstables is expected. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http

Re: Query regarding SSTable timestamps and counts

2012-12-10 Thread B. Todd Burruss
format. Agree. In the world of compaction, and excluding upgrades, have older sstables is expected. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 21/11/2012, at 11:45 AM, Edward Capriolo edlinuxg...@gmail.com

Re: Changing the CLI, not a great idea!

2011-07-28 Thread Jonathan Ellis
. I think it is clear that we need a better one-stop-shop for good documentation. What hasn't been talked about much - but I think it's just as important - is a good one-stop-shop for Cassandra's oral history. (You might think this list is the place, but it's too noisy to be useful

Re: Changing the CLI, not a great idea!

2011-07-28 Thread Edward Capriolo
Cassandra exists in a kind of oral history, that is hard to keep up with, and even harder to understand once it's long past. I think it is clear that we need a better one-stop-shop for good documentation. What hasn't been talked about much - but I think it's just

Re: Key Caching

2010-07-26 Thread Dathan Pattishall
heap is garbage collected. WIth the CMS (Concurrent Mark/Sweep) collector the intent is that the periodic scans of the entire Java heap are done concurrently with the application without pausing it. Fallback to full stop-the-world garbage collections can still happen if CMS fails

SV: Key Caching

2010-07-27 Thread Thorvaldsson Justus
(Concurrent Mark/Sweep) collector the intent is that the periodic scans of the entire Java heap are done concurrently with the application without pausing it. Fallback to full stop-the-world garbage collections can still happen if CMS fails to complete such work fast enough, in which case tweaking

Re: Reduce Cassandra GC

2013-04-17 Thread Joel Samuelsson
of these peaks, I get these stop-the-world GC for 6-7 minutes. Why does GC take up so much time even though the heap isn't full? I am aware that my access patterns make key caching very unlikely to be high. And indeed, my average key cache hit ratio during the run of the scripts is around

Re: Reduce Cassandra GC

2013-04-17 Thread aaron morton
% the next 5 minutes and so on. I get no Heap is X full... messages. Every once in a while at one of these peaks, I get these stop-the-world GC for 6-7 minutes. Why does GC take up so much time even though the heap isn't full? I am aware that my access patterns make key caching very

Re: Schema changes: where in Java code are they sent?

2015-02-25 Thread Richard Dawe
test -v 2.0.8 -n 3 --vnodes -s”). With all three nodes up, my schema operations were working fine. When I took down two nodes using “ccm node2 stop”, “ccm node3 stop”, I found that schema operations through “ccm node1 cqlsh” were failing like this: cqlsh ALTER TABLE test.test3 ADD fred text

Re: Cassandra GC Settings

2011-01-17 Thread SriSatish Ambati
: 26.7617560 seconds Now, a full stop of the application was what I was seeing extensively before (100-200 times over the course of a major compaction as reported by gossipers on other nodes). I have also just noticed that the previous instability (ie application stops) correlated with the compaction

Re: Upgrading from 0.6 to 0.7.0

2011-01-19 Thread Anthony Molinaro
a stop the world upgrade. This may be mitigated by the use of a client library which might hide some of the thrift calls, but if you have your own thrift client, you sort of have to shutdown everything and restart. This is definitely a problem for our usage, so I will not even consider a 0.6

Re: Issue restarting cassandra with a cluster running Cassandra 1.2.x and Cassandra 2.0.x

2015-03-04 Thread Fabrice Facorat
: 4f9814ec-0ac2-3c3e-b052-cad71f644e42: [10.xxx.xx.9, 10.xxx.xx.8, 10.xxx.xx.5, 10.xxx.xx.4, 10.xxx.xx.7, 10.xxx.xx.6] === OK: node 10.xxx.xx.4 restart fine = Now stop 10.xxx.xx.9 root@node001[SPH][BENCH][PnS3]:~/schemas$ nodetool describecluster Cluster Information: Name: Bench Cassandra PnS Snitch

Re: scylladb

2017-03-12 Thread Avi Kivity
the happier it will be. There are other factors, like NUMA-friendliness, but in the end it all boils down to efficiency and control. None of this is new btw, it's pretty common in the storage world. Avi On 03/11/2017 11:18 PM, Kant Kodali wrote: Here is the Java

Re: scylladb

2017-03-11 Thread Dor Laor
gt; > F*or my workload and probably others I had seen Cassandra was always > been CPU bound.* > Could be. However, try to make it CPU bound on 10 core, 20 core and more. The more core you use, the less nodes you need and the overall overhead decreases. > >> There are other

Re: scylladb

2017-03-12 Thread Avi Kivity
t in the end it all boils down to efficiency and control. None of this is new btw, it's pretty common in the storage world. Avi On 03/11/2017 11:18 PM, Kant Kodali wrote: Here is the Java version http://docs.paralleluniverse.co/quasar/ <http://docs.paralleluniverse.co/quasa

Re: scylladb

2017-03-12 Thread Kant Kodali
machine), and when you profile it, > you see it spending much of its time in atomic operations, or > parking/unparking threads -- fighting with itself. It doesn't scale within > the machine. Scylla will happily utilize all of the cores that it is > assigned (all of them by default in m

Re: Tuning cassandra (compactions overall)

2012-05-21 Thread Alain RODRIGUEZ
GB of memory is so different. Less frequent and less aggressive garbage collection frees up CPU resources to run the database. Less memory results in frequent and aggressive (i.e. stop the world) GC, and increase IO pressure. Which reduces read performance and in the extreme can block writes

Re: Tuning cassandra (compactions overall)

2012-05-22 Thread aaron morton
, 2, 4, 8 or 16 GB of memory is so different. Less frequent and less aggressive garbage collection frees up CPU resources to run the database. Less memory results in frequent and aggressive (i.e. stop the world) GC, and increase IO pressure. Which reduces read performance and in the extreme

Re: Finding bottleneck of a cluster

2012-07-05 Thread aaron morton
). The average gc pause time for parnew are 100ms occuring every second. So cassandra spends 10% of its time stuck in Stop the world collector. The os load is around 16-20 and the average write latency is 3ms. tpstats do not show any significant pending tasks. At this point suddenly, Several

Re: Finding bottleneck of a cluster

2012-07-05 Thread rohit bhatia
read operations and cpu's iowait is around 0.05%) and is not swapping its memory(around 15 gb RAM is free or inactive). The average gc pause time for parnew are 100ms occuring every second. So cassandra spends 10% of its time stuck in Stop the world collector. The os load is around 16-20

Re: TimedOutException caused by

2012-07-24 Thread J . Amding
one. When I run stress load testing, I got this TimedOutException, and some operation failed, and all traffic hang for a while.  And when I have 1G memory 32 bit cassandra on standalone model, I didn't find so frequently Stop the world behavior. So I wonder what kind of operation will hang

Re: Read Latency Degradation

2010-12-18 Thread Wayne
during compactions/repairs, or just the overall throughput/latency during normal operation? I have considered dropping the heap down to 8gb, but having pained through many cmf in the past I thought the larger heap should help prevent the stop the world gc. I'm not sure what got merged

Re: cassandra and G1 gc

2011-03-09 Thread Peter Schuller
even the occasional stop-the-world full GC even after extended runtime. (Keyword being *potential*.) Now, first of all, G1 is still immature compared to CMS. But even if you are in a position that you are willing to trust G1 in some particular JVM version for your production use, and even if G1

Re: Follow-up post on cassandra configuration with some experiments on GC tuning

2010-08-28 Thread Peter Schuller
(if the application behaves according to the weak generational hypothesis - google it if you want a ref) because less data is promoted to old gen and because the overhead of stop-the-world is lessened. (3) The larger the young generation, the longer the pause times to do collections of the young

Re: GC pauses affecting entire cluster.

2015-06-01 Thread graham sanderson
stop-the-world gc. These gc's happen once every 2 hours. I did find a ticket that seems related to this: https://issues.apache.org/jira/browse/CASSANDRA-3853 https://issues.apache.org/jira/browse/CASSANDRA-3853, but Jonathan Ellis has resolved this ticket. We are running standard gc

Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-10 Thread horschi
com> wrote: > > I tried to set up a new node with join_ring=false once. In my test > > that node did not pick a token in the ring. I assume running repair > > or rebuild would not do anything in that case: No tokens = no data. > > But I must admit: I have not tried running

Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-10 Thread Samuel CARRIERE
thing in that case: No tokens = no data. > But I must admit: I have not tried running rebuild. > > I admit I haven't been following this thread closely, perhaps I have > missed what exactly it is you're trying to do. > > It's possible you'd need to : > > 1) join the node with au

<    1   2   3   4   >