Re: Cassandra multi DC

2012-03-29 Thread Eric Tamme
On 03/29/2012 01:35 PM, Alexandru Sicoe wrote: Hello everyone, How are people running multi DC Cassandra across remote locations? Are VPNs used? Or some dedicated application proxis? What is the norm here? Any advice is much appreciated, Alex No vpn for us. We do it all over pure ipv6 cir

Re: Cassandra C client implementation

2011-12-14 Thread Eric Tamme
link, but here it is for your reference https://github.com/junction/db_jnctn_usrloc Hopefully that helps. I idle in #opensips too, just ask about cassandra in there and I'll probably see it. - Eric Tamme

Re: cassandra most stable version ?

2011-12-08 Thread Eric Tamme
On 12/08/2011 04:50 PM, Attila Babo wrote: 0.6.12, we had serious problem with 0.7.x and 0.8.x Seriously folks - dont make the choice to run 0.6.x build now, that would be like burning all our books (and e-books, and internets) and returning to the dark ages by choice. 0.7.x is the first "m

Re: Using ttl to expire columns rather than using delete

2011-10-12 Thread Eric Tamme
On 10/11/2011 09:49 PM, Terry Cumaranatunge wrote: Hello, If you set a ttl and expire a column, I've read that this eventually turns into a tombstone and will be cleaned out by the GC. Are expirations considered a form of delete that still requires a node repair to be run in gc_grace_period se

Re: Multi DC setup

2011-10-11 Thread Eric Tamme
We already have two separate rings. Idea of bidirectional sync is, if one ring is down, we can still send the traffic to other ring. When original cluster comes back, it will pick up the data from available cluster. I'm not sure if it makes sense to have separate rings or combine these two rings

Re: Cassandra prod environment

2011-09-02 Thread Eric Tamme
On 09/02/2011 11:30 AM, Sorin Julean wrote: Hey, Currently I'm running Cassandra on Ubuntu 10.4 x86_64 in EC2. I'm wondering if anyone observed a better performance / stability on other distros ( CentOS / RHEL / ...) or OS (eg. Solaris intel/SPARC) ? Is anyone running prod on VMs, not clou

asynchronous writes (aka consistency = 0)

2011-08-30 Thread Eric tamme
Is there any mechanism that would allow me to write to Cassandra with no blocking at all? I spent a long time figuring out a problem I encountered with one node in each datacenter: LA, and NY using SS RF=1 and write consistency 1. My row keys are -mm-dd-h so basically for every hour a row woul

Re: Problems using Thrift API in C

2011-07-28 Thread Eric Tamme
On 07/28/2011 05:29 AM, Aleksandrs Saveljevs wrote: essentially a rewrite of the first part of the C++ example given at http://wiki.apache.org/cassandra/ThriftExamples#C.2B-.2B- . If we run it under strace, we see that it hangs on the call to recv() when setting keyspace: $ strace -s 64 ./tes

Re: Write placement questions: ringIterator() and firstTokenIndex()

2011-07-12 Thread Eric tamme
On Tue, Jul 12, 2011 at 3:32 PM, Jonathan Ellis wrote: > You're going to be mad at how simple the answer turns out to be. :) > > Nodes "own" the range from (previous, token], NOT from [token, next). > So, the last node will get from (50, 75] and the first will get from > (75, 0]. > Okay i figured

Write placement questions: ringIterator() and firstTokenIndex()

2011-07-12 Thread Eric tamme
I have been reading through some code in TokenMetadata.java, specifically with ringIterator() and firstTokenIndex(). I am trying to get a very firm grasp on how nodes are collected for writes. I have run into a bit of confusion about what happens when the data's token is larger than than the larg

Re: When is 'Cassandra High Performance Cookbook' expected to be available ?

2011-07-07 Thread Eric tamme
On Thu, Jul 7, 2011 at 11:09 AM, A J wrote: > https://www.packtpub.com/cassandra-apache-high-performance-cookbook/book > I think last Ed said was "soon" ha ha. The book is done, and I think it's up to the publisher at this point. Maybe a couple of weeks? Ed?

Re: How to make a node an exact replica of another node ?

2011-07-05 Thread Eric tamme
AJ, You can use offset mirror tokens to achieve this. Pick your initial tokens for DC1N1 and DC1N2 as if they were the only nodes in your cluster. Now increment each by 1 and use them as the tokens for DC2N1 and DC2N2. This will give you a complete keyspace within each data center with even di

Re: Data storage security

2011-06-29 Thread Eric tamme
On Wed, Jun 29, 2011 at 12:37 PM, A J wrote: > Are there any options to encrypt the column families when they are > stored in the database. Say in a given keyspace some CF has sensitive > info and I don't want a 'select *' of that CF to layout the data in > plain text. > > Thanks. > I think this

Re: NTS Replication Strategy - only replicating to a subset of data centers

2011-06-23 Thread Eric tamme
On Wed, Jun 22, 2011 at 6:58 PM, AJ wrote: > I'm just double-checking, but when using NTS, is it required to specify ALL > the data centers in the strategy_options attribute? > > IOW, I do NOT want replication to ALL data centers; only a two of the three. >  So, if my property file snitch describe

Re: simple question about merged SSTable sizes

2011-06-22 Thread Eric tamme
>> Second, compacting such large files is an IO killer.    What can be tuned >> other than compaction_threshold to help optimize this and prevent the files >> from getting too big? >> >> Thanks! > > Just a personal implementation note - I make heavy use of column TTL, so I have very specifically t

Re: simple question about merged SSTable sizes

2011-06-22 Thread Eric tamme
On Wed, Jun 22, 2011 at 12:35 PM, Jonathan Colby wrote: > > The way compaction works,  "x" same-sized files are merged into a new > SSTable.  This repeats itself and the SSTable get bigger and bigger. > > So what is the upper limit??     If you are not deleting stuff fast enough, > wouldn't the

Re: Docs: Token Selection

2011-06-17 Thread Eric tamme
> Yes.  But, the more I think about it, the more I see issues.  Here is what I > envision (Issues marked with *): > > Three or more dc's, each serving as fail-overs for the others with 1 maximum > unavailable dc supported at a time. > Each dc is a production dc serving users that I choose. > Each d

Re: Docs: Token Selection

2011-06-17 Thread Eric tamme
> What I don't like about NTS is I would have to have more replicas than I > need.  {DC1=2, DC2=2}, RF=4 would be the minimum.  If I felt that 2 local > replicas was insufficient, I'd have to move up to RF=6 which seems like a > waste... I'm predicting data in the TB range so I'm trying to keep rep

Re: Docs: Token Selection

2011-06-17 Thread Eric tamme
On Fri, Jun 17, 2011 at 12:07 PM, AJ wrote: > Thanks Jonathan.  I assumed since each data center owned the full key space > that the first replica would be stored in the dc of the coordinating node, > the 2nd in another dc, and the 3rd+ back in the 1st dc.  But, are you saying > that the first end

Re: Docs: Token Selection

2011-06-16 Thread Eric tamme
On Thu, Jun 16, 2011 at 11:11 AM, Sasha Dolgy wrote: > So, with ec2 ... 3 regions (DC's), each one is +1 from another? I dont use ec2, so I am not familiar with the specifics of deployment there. That said, if you have 3 data centers with equal nodes in each (so that you would calculate the

Re: Docs: Token Selection

2011-06-16 Thread Eric tamme
AJ, sorry I seemed to miss the original email on this thread. As Aaron said, when computing tokens for multiple data centers, you should compute them independently for each data center - as if it were its own Cassandra cluster. You can have "overlapping" token ranges between multiple data center

Re: Is this the proper use of OPP?

2011-06-14 Thread Eric tamme
I would point you to this article, it does a good job describing OPP and pretty much answers the specific questions you asked. http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/ -Eric On Mon, Jun 13, 2011 at 5:06 PM, AJ wrote: > I'm just becoming

Re: Direct control over where data is stored?

2011-06-05 Thread Eric tamme
On Sun, Jun 5, 2011 at 12:18 PM, Khanh Nguyen wrote: > Hi Maki and Adrian, > > Thank you very much for the promptness. It's weekend after all :). > > I realized I forgot a part of my question until Adrian mentioned the > replication factor. Is it also possible to set where the replicas are > store

Re: Cassandra node is not blanced Rf=2 Random Partitioner

2011-05-13 Thread Eric tamme
On Fri, May 13, 2011 at 2:46 AM, Ali Ahsan wrote: > My cluster is unbalanced.One have 99 GB Data and other have 87 GB can any > one explain why this is happening. They are "pretty close" ... since a row key is pinned to a node - it is possible that you have a really large row(s) on your .4 node.

Re: Data types for cross language access

2011-05-11 Thread Eric tamme
> On Wed, May 11, 2011 at 10:23 AM, Luke Biddell wrote: >> I wouldn't mind knowing how other people are approaching this problem too. >> >> On 11 May 2011 11:27, Oliver Dungey wrote: >>> I am currently working on a system with Cassandra that is written purely in >>> Java. I know our end solution

Re: RequestResponseStage Assertion Error

2011-05-09 Thread Eric tamme
On Mon, May 9, 2011 at 7:18 AM, aaron morton wrote: > You can check the schema using cassandra-cli, run "describe cluster" it will > tell you how many schemas are defined. > I think the best approach when you discover bad schemas is to drain then > stop the affected node, remove the Location, Migr

Re: RequestResponseStage Assertion Error

2011-05-09 Thread Eric tamme
On Sun, May 8, 2011 at 7:17 PM, aaron morton wrote: > What version are you on ? > > Check the nodetool ring from each node in your cluster to check they have the > same view. I am running 0.7.3. I checked nodetool ring on all hosts and it all comes back the same. I had some funky business whe

RequestResponseStage Assertion Error

2011-05-08 Thread Eric tamme
I have a 4 node ring that was setup with tokens a,b,c,d using NTS and 2 nodes in each of 2 datacenters with a replication of DC1:1, DC2:1. I was getting uneven replica placement so I did a drop keyspace, followed by a nodetool move to DC1 having tokens (a,b) and DC2 having tokens (a+1,b+1) , then

Re: Cassandra CMS

2011-05-05 Thread Eric tamme
> Does anyone know of a content management system that can be easily > customized to use Cassandra as its database? > > (Even better, if it can use Cassandra without customization!) > I think your best bet will be to look for a CMS that uses an ORM for the storage layer and write a specific ORM fo

Re: Node setup - recommended hardware

2011-05-04 Thread Eric tamme
On Wed, May 4, 2011 at 12:25 PM, Anthony Ikeda wrote: > I just want to ask, when setting up nodes in a Node ring is it worthwhile > using a 2 partition setup? i.e. Cassandra on the Primary, data directories > etc on the second partition or does it really not make a difference? > Anthony > I don'

Re: Replica data distributing between racks

2011-05-04 Thread Eric tamme
On Wed, May 4, 2011 at 10:09 AM, Konstantin Naryshkin wrote: > The way that I understand it (and that seems to be consistent with what was > said in this discussion) is that each DC has its own data space. Using your > simplified 1-10 system: >   DC1   DC2 > 0  D1R1  D2R2 > 1  D1R1  D2R1 > 2  D

Re: Replica data distributing between racks

2011-05-04 Thread Eric tamme
>        Jonathan is suggesting the approach Jeremiah was using. > >        Calculate the tokens the nodes in each DC independantly, and then add > 1 to the tokens if there are two nodes with the same tokens. > >        In your case with 2 DC's with 2 nodes each. > > In DC 1 > node 1 = 0 > node 2

Re: Replica data distributing between racks

2011-05-03 Thread Eric tamme
On Tue, May 3, 2011 at 4:08 PM, Jonathan Ellis wrote: > On Tue, May 3, 2011 at 2:46 PM, aaron morton wrote: >> Jonathan, >>        I think you are saying each DC should have it's own (logical) token >> ring. > > Right. (Only with NTS, although you'd usually end up with a similar > effect if you

Re: Replica data distributing between racks

2011-05-03 Thread Eric tamme
On Tue, May 3, 2011 at 10:13 AM, Jonathan Ellis wrote: > Right, when you are computing balanced RP tokens for NTS you need to > compute the tokens for each DC independently. I am confused ... sorry. Are you saying that ... I need to change how my keys are calculated to fix this problem? Or are

Re: Write performance help needed

2011-05-03 Thread Eric tamme
Use more nodes to increase your write throughput. Testing on a single machine is not really a viable benchmark for what you can achieve with cassandra.

Re: Replica data distributing between racks

2011-05-02 Thread Eric tamme
On Mon, May 2, 2011 at 5:59 PM, aaron morton wrote: > My bad, I missed the way TokenMetadata.ringIterator() and firstTokenIndex() > work. > > Eric, can you show the output from nodetool ring ? > > Sorry if the previous paste was way to unformatted, here is a pastie.org link with nicer formatting

Re: Replica data distributing between racks

2011-05-02 Thread Eric tamme
On Mon, May 2, 2011 at 5:59 PM, aaron morton wrote: > My bad, I missed the way TokenMetadata.ringIterator() and firstTokenIndex() > work. > > Eric, can you show the output from nodetool ring ? > Here is output from nodtool ring - ip addresses changed obviously. Address Status State Lo

Re: Replica data distributing between racks

2011-05-02 Thread Eric tamme
On Mon, May 2, 2011 at 3:22 PM, Jonathan Ellis wrote: > On Mon, May 2, 2011 at 2:18 PM, aaron morton wrote: >> When the NTS selects replicas in a DC it orders the tokens available in  the >> DC, then (in the first pass) iterates through them placing a replica in each >> unique rack.  e.g. if th

Replica data distributing between racks

2011-05-02 Thread Eric tamme
I am experiencing an issue where replication is not being distributed between racks when using PropertyFileSnitch in conjunction with NetworkTopologyStrategy. I am running 0.7.3 from a tar.gz on cassandra.apache.org I have 4 nodes, 2 data centers, and 2 racks in each data center. Each rack has