Do you have an indication that at least the disk space is in fact
consistent with the amount of data being streamed between the nodes? I
think you had 90 - ~ 450 gig with RF=3, right? Still sounds like a
lot assuming repairs are not running concurrently (and compactions are
able to run after
This looks like an artifact of the way ownership is calculated for the OOP. See
https://github.com/apache/cassandra/blob/cassandra-0.8.4/src/java/org/apache/cassandra/dht/OrderPreservingPartitioner.java#L177
it was changed in this ticket
https://issues.apache.org/jira/browse/CASSANDRA-2800
The
Using memory allocated to the JVM is not really a problem unless it's OOM'ing.
Or running into performance issues due to excessive GC.
One scenario I could imagine is a timeout triggered on a dirty memtable, this
resulted in a flush, the flush resulted in a minor compaction, the minor
Did you clear the LocationInfo from the non prod cluster ?
When you gave prod seeds from non-prod, non-prod would have discovered all the
nodes in prod. Unless you have cleared the location info they will still have
that knowledge.
Does nodetool ring in non-prod list any prod machines ?
if
Unreachable nodes in either did not respond to the message or were known to be
down and were not sent a message.
The way the node lists are obtained for the ring command and describe cluster
are the same. So it's a bit odd.
Can you connect to JMX and have a look at the o.a.c.db.StorageService
since nodetool cleanup could remove hinted handoff, will it cause the
data loss?
There is some confusion in the ring about nodes leaving. Check nodetool ring
from every node and see if they agree. Check the logs to see if there is any
information about node is sending the wrong message.
Without knowing much more you could try a rolling restart, but you may need a
full
Hello,
I have a ColumnFamily in which all columns are always set with a TTL,
this would be one of the hottest column families (rows_cached is set
to 1.0). I am wondering if TTL values also follow gc_grace? If they
do, am I correct in thinking it would be best to set gc_grace really
low in this
I am wondering if TTL values also follow gc_grace?
They are purged by the first compaction that processes them after TTL has
expired. The TTL expiry is used the same way as the expire on a Tombstone.
Thinking out loud, is this possible….
t0 - write col to all 3 replicas.
t1 - overwrite col
On Sun, Aug 21, 2011 at 2:21 PM, aaron morton *@thelastpickle.com wrote:
I am wondering if TTL values also follow gc_grace?
They are purged by the first compaction that processes them after TTL has
expired. The TTL expiry is used the same way as the expire on a Tombstone.
Thinking out
Hi,
I will wait until this is fixed beforeI upgrade, just to be sure.
Shall I open a new ticket for this issue?
Thanks,
Thibaut
On Sun, Aug 21, 2011 at 11:57 AM, aaron morton aa...@thelastpickle.com wrote:
This looks like an artifact of the way ownership is calculated for the OOP.
See
Hi,
7.4, 3 node cluster, RF=3
Load has not changed much, on 2 of the 3 nodes the commit log filled up in
less than a minute (did not give a chance to recover). Now have been running
this cluster for abt 2-3 months without any problem. At this point I do not
see any unusual load (continue to
So no it did not fill in a minute, but ton's of header files were written in
a minute (is that normal, I assume these are marker files which get written
when memtables are flushed. The actual data files have been around for the
last 24 hours?
Somehow this all seems connected to reintroduce node
When does the actual commit-data file get deleted.
The flush interval on all my memtables is 60 minutes
They *should* be getting deleted when they no longer contain any data
that has not been flushed to disk. Are flushes definitely still
happening? Is it possible flushing has started failing
We have a lot of space on /data, and looks like it was flushing data fine
from file timestamps.
We did have a bit of goofup with IP's when bringing up a down node (and the
commit files have been around since then). Wonder if that is what triggered
it and we have a bunch of hinted handoff's being
Both .2 and .3 list the same from the mbean that Unreachable is empty
collection, and Live node lists all 3 nodes still:
192.168.20.2
192.168.20.3
192.168.20.1
The removetoken was done a few days ago, and I believe the remove was done from
.2
Here is what ring outlook looks like, not sure why
I'm not sure what the fix is.
When using an order preserving partitioner it's up to you to ensure the ring is
correctly balanced.
Say you have the following setup…
node : token
1 : a
2 : h
3 : p
If keys are always 1 character we can say each node own's roughly 33% of the
ring. Because we
Yup, you can check the what HH is doing via JMX.
there is a bug in 0.7 that can result in log files not been deleted
https://issues.apache.org/jira/browse/CASSANDRA-2829
Cheers
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On
I see the mistake I made about ring, gets the endpoint list from the same place
but uses the token's to drive the whole process.
I'm guessing here, don't have time to check all the code. But there is a 3 day
timeout in the gossip system. Not sure if it applies in this case.
Anyone know ?
I was aware of the deleted items might be come back alive without proper
node repair.
how about modified items, for example 'A'={1,2,3}. then 'A'={4,5}. if
that possible 'A' change back to {1,2,3}?
I have encountered this mystery problem after go through a mess procedure
with cassandra nodes,
Hi,
I'm working on this project for a few months now and I think it's mature
enough to post it here:
https://github.com/sebgiroux/Cassandra-Cluster-Admin Cassandra Cluster Admin
on GitHub
Basically, it's a GUI for Cassandra. If you're like me and used MySQL for a
while (and still using it!),
I have encountered this problem while update the key cache and row cache. I
once updated them to 0(disable) while node2 was not available, when it
comeback they eventually have the same schema version.
[default@prjspace] describe cluster;
Cluster Information:
Snitch:
just tried it and it works like a charming! thanks a lot for the great
work!
On Mon, Aug 22, 2011 at 9:47 AM, SebWajam sebast...@wajam.com wrote:
Hi,
I'm working on this project for a few months now and I think it's mature
enough to post it here:
Cassandra Cluster Admin on
thanks for the migration tip, but the schema is in agreement.
[default@prjspace] describe cluster;
Cluster Information:
Snitch: org.apache.cassandra.locator.SimpleSnitch
Partitioner: org.apache.cassandra.dht.RandomPartitioner
Schema versions:
79d072cc-cc62-11e0-a753-5525ca993302:
Ok, will look into it, thx for the heads up.
Sent from a mobile device, please forgive typos.
On Aug 21, 2011 6:45 PM, Yan Chunlu springri...@gmail.com wrote:
I was aware of the deleted items might be come back alive without proper
node repair.
how about modified items, for example
It's been at least 4 days now.
-Original Message-
From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Sunday, August 21, 2011 3:16 PM
To: user@cassandra.apache.org
Subject: Re: Completely removing a node from the cluster
I see the mistake I made about ring, gets the endpoint list
My guess: you're using an old version of the cli that isn't dealing
with bytestype column names correctly
On Mon, Aug 22, 2011 at 12:08 AM, Yan Chunlu springri...@gmail.com wrote:
connect to cassandra-cli and issue the list my cf I got
RowKey: comments_62559
= (column=76616c7565,
We just noticed that at one time, 4 nodes were doing minor compaction together,
each of them took 20~60 minutes.
We're on 0.8.1, 6 nodes, RF5.
This simultaneous compactions slowed down the whole cluster, we have
local_quorum consistency level, therefore, dynamic_snitch is not helping us.
Aside
You should throttle your compactions to a sustainable level.
-ryan
On Sun, Aug 21, 2011 at 10:22 PM, Hefeng Yuan hfy...@rhapsody.com wrote:
We just noticed that at one time, 4 nodes were doing minor compaction
together, each of them took 20~60 minutes.
We're on 0.8.1, 6 nodes, RF5.
This
29 matches
Mail list logo