Re: commodity server spec

2011-09-03 Thread Peter Schuller
Is there any recommendation about commodity server hardware specs if 100TB database size is expected and its heavily write application. Should I got with high powered CPU (12 cores) and 48TB HDD and 640GB RAM and total of 3 servers of this spec. Or many smaller commodity servers are

Re: commodity server spec

2011-09-03 Thread China Stoffen
Many small servers would drive up the hosting cost way too high so want to avoid this solution if we can. - Original Message - From: Radim Kolar h...@sendmail.cz To: user@cassandra.apache.org Cc: Sent: Saturday, September 3, 2011 9:37 AM Subject: Re: commodity server spec many

Re: commodity server spec

2011-09-03 Thread Chris Goffinet
It will also depend on how long you can handle recovery time. So imagine this case: 3 nodes w/ RF of 3 Each node has 30TB of space used (you never want to fill up entire node). If one node fails and you must recover, that will take over 3.6 days in just transferring data alone. That's with a

need help setting up production environment

2011-09-03 Thread Ben Ashton
I've been dropped in it a little and had to build a prod setup going live on monday. At the moment I have setup three server in EC2 US, one each AZ the servers are setup as follows:- m1.xlarge using the Amazons AMI instance storage image, the ephemerals 4 x 420 GB setup in raid10 and formatted

Re: need help setting up production environment

2011-09-03 Thread Jeremy Hanna
I would look at http://www.slideshare.net/mattdennis/cassandra-on-ec2 Also, people generally do raid0 on the ephemerals. EBS is a bad fit for cassandra - see the presentation above. However, that means you'll need to have a backup strategy, which is also mentioned in the presentation. Also

Re: need help setting up production environment

2011-09-03 Thread Ben Ashton
Hi Jeremy, I dont remember setting up snitch. The servers are all in a VPC, the only thing I did was configure the seed IP so all the nodes can see each other. Ben On Sat, Sep 3, 2011 at 11:13 PM, Jeremy Hanna jeremy.hanna1...@gmail.comwrote: I would look at

Re: need help setting up production environment

2011-09-03 Thread Jeremy Hanna
Okay - I just really wanted to point out Matt's presentation as food for thought. We've had some painful experiences we've learned a lot from and wished we had some of those tips when we were starting out. On Sep 3, 2011, at 5:19 PM, Ben Ashton wrote: Hi Jeremy, I dont remember setting up

Not all data structures need timestamps (and don't require wasted memory).

2011-09-03 Thread Kevin Burton
I was thinking more about the excessive (IMO) use of memory in Cassandra due to 8 bytes per column/row (cell) in Cassandra. Any operation that is idempotent does not require a timestamp. For example, set membership. A link adjacency list is a good example. If you have a list of source-targets,

Re: commodity server spec

2011-09-03 Thread Bill
[100% agree with Chris] China, the machines you're describing sound nice for mongodb/postgres/mysql, but probably not the sweetspot for Cassandra. Obviously (well depending on near term load) you don't want to get burned on excess footprint. But a realistic, don't lose data, be fairly

Re: Not all data structures need timestamps (and don't require wasted memory).

2011-09-03 Thread Stephen Connolly
maybe not all nosql applications fit cassandra. the whole core logic of how cassandra is eventually consistent is because of the per column timestamps... if they are a pain for you consider storing eg as a small number of fat columns rather than many skinny ones... either that or look at a

Re: Not all data structures need timestamps (and don't require wasted memory).

2011-09-03 Thread Kevin Burton
Sure….. I'm willing to concede that Cassandra isn't for anyone but why make it worse than it has to be? Why 8 bytes? Why not 64 bytes? I imagine even in your situation a 8x boost in storage would not be nice ;) The point is that replication in Cassandra only needs timestamps to handle out of

Re: Not all data structures need timestamps (and don't require wasted memory).

2011-09-03 Thread Jonathan Ellis
I strongly suspect that you're optimizing prematurely. What evidence do you have that timestamps are producing unacceptable overhead for your workload? You do realize that the sparse data model means that we spend a lot more than 8 bytes storing column names in-line with each column too, right?

Fatal startup error -- Instance already exists

2011-09-03 Thread Eric Czech
Hi, I recently upgraded 10 nodes from 7.5 to 8.4 and 9 of them work now but on one node I'm getting an exception on startup that I can't seem to fix. Has anyone seen this before or have any suggestions as to how to correct the issue here? Here's the exception I'm getting:

Re: Not all data structures need timestamps (and don't require wasted memory).

2011-09-03 Thread Kevin Burton
On Sat, Sep 3, 2011 at 8:53 PM, Jonathan Ellis jbel...@gmail.com wrote: I strongly suspect that you're optimizing prematurely. What evidence do you have that timestamps are producing unacceptable overhead for your workload? It's possible … this is back of the envelope at the moment as right

Re: Fatal startup error -- Instance already exists

2011-09-03 Thread Jonathan Ellis
That means somehow there is more than one copy of that CF declared in the schema on that node. Delete the schema and let it pull it from another node, as in wiki.apache.org/cassandra/FAQ#schema_disagreement. On Sun, Sep 4, 2011 at 12:03 AM, Eric Czech e...@nextbigsound.com wrote: Hi, I