> Somewhere I remember discussions about issues with the merkle tree range
> splitting or some such that resulted in repair always thinking a little bit
> of data was out of sync.
https://issues.apache.org/jira/browse/CASSANDRA-2324 - fixed for early 0.8.
I don't *think* there's a know open bug t
> I've know run 7 repairs in a row on this keyspace and every single one has
> finished successfully but performed streams between all nodes. This keyspace
> was written to over the course of several weeks, sometimes with
How much data is streamed, do you know? Mainly interesting is if there
is a
Somewhere I remember discussions about issues with the merkle tree range
splitting or some such that resulted in repair always thinking a little bit of
data was out of sync.
If you want to get a better idea about what's been transfered turn the logging
up to DEBUG or turn it up just for org.ap
I have a smallish keyspace on my 3 node, RF=3 cluster. My cluster has no
read/write traffic while I am testing repairs. I am running 0.8.4 of debian
packages on ubuntu.
I've know run 7 repairs in a row on this keyspace and every single one has
finished successfully but performed streams between al
>
> ctrl-c will not stop the repair.
>
Ok, so that's why I've been seeing logs of repairs on other CFs
That's probably the 2280 issue. Data from all CF's is streamed over
>
Ah, I get it now.
Thanks
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorto
ctrl-c will not stop the repair.
You kind of check things by looking at netstat compationstats , that will just
tell you if there are compactions backing up. Not necessarily if they are
validation compactions used during repairs. You can trawl the logs to look for
messages from the AntiEntropy
One last thought : what happens when you ctrl-c a nodetool repair ? Does it
stop the repair on the server ? If not, then I think I have multiple repairs
still running. Is there any way to check this ?
Thanks
2011/8/16 Philippe
> Even more interesting behavior : a repair on a CF has consequences
Thanks for the pointers, responses inline.
On Tue, Aug 16, 2011 at 3:48 PM, Philippe wrote:
> > I have been able to repair some small column families by issuing a repair
> > [KS] [CF]. When testing on the ring with no writes at all, it still takes
> > about 2 repairs to get "consistent" logs for
Even more interesting behavior : a repair on a CF has consequences on other
CFs. I didn't expect that.
There are no writes being issued to the cluster yet the logs indicate that
- SSTableReader has opened dozens and dozens of files, most of them
unrelated to the CF being repaired
- compa
On Tue, Aug 16, 2011 at 3:48 PM, Philippe wrote:
> I have been able to repair some small column families by issuing a repair
> [KS] [CF]. When testing on the ring with no writes at all, it still takes
> about 2 repairs to get "consistent" logs for all AES requests.
I think I linked these in anoth
I'm still trying different stuff. Here are my latest findings, maybe someone
will find them useful:
- I have been able to repair some small column families by issuing a
repair [KS] [CF]. When testing on the ring with no writes at all, it still
takes about 2 repairs to get "consistent" log
@Teijo : thanks for the procedure, I hope I won't have to do that
Peter, I'll answer inline. Thanks for the detailed answer.
> > the number of SSTables for some keyspaces goes dramatically up (from 3 or
> 4
> > to several dozens).
>
> Typically with a long running compaction, such as that trigge
Forgot to mention, you want to check the following in cassandra.yaml on the
node that you bootstrap before you initiate the bootstrap:
* Ensure that the initial_token is set to the correct value (see nodetool)
* Ensure that the seeds list doesn't contain the IP of the node you are trying
to boo
> oh i know you can run rf 3 on a 3 node cluster. more i thought that if you
> have one fail you have less nodes than the rf, so the cluster is at less
> than rf, and writes might be disabled or something like that, while at 4 you
> still have met the rf...
A node failing is independent of RF. *De
Sorry about the lack of response to your actual issue. I'm afraid I
don't have an exhaustive analysis, but some quick notes:
> balanced ring but the other nodes are at 60GB. Each repair basically
> generates thousands of pending compactions of various types (SSTable build,
> minor, major & validat
Hi,
I took the following steps to get a node that refused to repair back under
control.
WARNING: This resulted in some data loss for us, YMMV with your replication
factor
* Turn off all row & key caches via cassandra-cli
* Set "disk_access_mode: standard" in cassandra.yaml
* Kill Cassandra on
No it depends on the consistency level. It's different : for example, QUORUM
= 2 for RF=3
Anyway, anyone have an answer to my real issue ?
Thanks
2011/8/14 Stephen Connolly
> oh i know you can run rf 3 on a 3 node cluster. more i thought that if you
> have one fail you have less nodes than the
oh i know you can run rf 3 on a 3 node cluster. more i thought that if you
have one fail you have less nodes than the rf, so the cluster is at less
than rf, and writes might be disabled or something like that, while at 4 you
still have met the rf...
- Stephen
---
Sent from my Android phone, so ra
5 hours later, the number of pending compactions host up to 8k as usual, the
number of SST tables for another keyspace shot up to 160 (from 4).
At 4pm, a daily cron job that runs repair starts on that same node and all
of a sudden, the number of pending compactions went down to 4k and to number
of
> i am always wondering why people run clusters with number of nodes == rf
>
> i thought you needed to have number of nodes > rf ti gave any sensible
> behaviour... but i am no expert at all
No. The only requirement is that the number of nodes be >= RF, since
clearly in a cluster with fewer nodes
i am always wondering why people run clusters with number of nodes == rf
i thought you needed to have number of nodes > rf ti gave any sensible
behaviour... but i am no expert at all
- Stephen
---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense a
Hello, I've been fighting with my cluster for a couple days now... Running
0.8.1.3, using Hector and loadblancing requests across all nodes.
My question is : how do I get my node back under control so that it runs
like the other two nodes.
It's a 3 node, RF=3 cluster with reads & writes at LC=QUO
22 matches
Mail list logo