Re: Effective allocation of multiple disks

2010-03-10 Thread Stu Hood
data ends up on one disk. If you need the additional io, you will want raid0. But simply listing multiple DataFileDirectories will not work. -Anthony On Wed, Mar 10, 2010 at 02:08:13AM -0600, Stu Hood wrote: > You can list multiple DataFileDirectories, and Cassandra will scatter files > ac

RE: CassandraHardware link on the wiki FrontPage

2010-03-10 Thread Stu Hood
Anyone can edit any page once they have an account: click the "Login" link at the top right next to the search box to create an account. Thanks, Stu -Original Message- From: "Eric Rosenberry" Sent: Wednesday, March 10, 2010 2:52am To: cassandra-user@incubator.apache.org Subject: Cassand

RE: Effective allocation of multiple disks

2010-03-10 Thread Stu Hood
You can list multiple DataFileDirectories, and Cassandra will scatter files across all of them. Use 1 disk for the commitlog, and 3 disks for data directories. See http://wiki.apache.org/cassandra/CassandraHardware#Disk Thanks, Stu -Original Message- From: "Eric Rosenberry" Sent: Wedn

Re: Hackathon?!?

2010-03-09 Thread Stu Hood
Definitely on board! -Original Message- From: "Dan Di Spaltro" Sent: Tuesday, March 9, 2010 8:05pm To: cassandra-user@incubator.apache.org Subject: Re: Hackathon?!? Alright guys, we have settled on a date for the Cassandra meetup on... April 15th, better known as, Tax day! We can host

RE: Latest check-in to trunk/ is broken

2010-03-08 Thread Stu Hood
Run `ant clean` before building. A few files moved around. -Original Message- From: "Cool BSD" Sent: Monday, March 8, 2010 5:18pm To: "cassandra-user" Subject: Latest check-in to trunk/ is broken version info: $ svn info Path: . URL: https://svn.apache.org/repos/asf/incubator/cassandra/

Re: Dynamically Switching from Ordered Partitioner to Random?

2010-03-05 Thread Stu Hood
But rather than switching, you should definitely try the 'loadbalance' approach first, and see whether OrderPP works out for you. -Original Message- From: "Chris Goffinet" Sent: Friday, March 5, 2010 1:43pm To: cassandra-user@incubator.apache.org Subject: Re: Dynamically Switching from O

Re: Connect during bootstrapping?

2010-03-02 Thread Stu Hood
You are probably in the portion of bootstrap where data to be transferred is split out to disk, which can take a while: see https://issues.apache.org/jira/browse/CASSANDRA-579 Look for a 'streaming' subdirectory in your data directories to confirm. -Original Message- From: "Brian Frank

Re: Is Cassandra a document based DB?

2010-03-01 Thread Stu Hood
> In HBase you have table:row:family:key:val:version, which some people > might consider richer Cassandra is actually table:family:row:key:val[:subval], where subvals are the columns stored in a supercolumn (which can be easily arranged by timestamp to give the versioned approach). -Origina

Re: Anti-compaction Diskspace issue even when latest patch applied

2010-02-28 Thread Stu Hood
`nodetool cleanup` is a very expensive process: it performs a major compaction, and should not be done that frequently. -Original Message- From: "shiv shivaji" Sent: Sunday, February 28, 2010 3:34pm To: cassandra-user@incubator.apache.org Subject: Re: Anti-compaction Diskspace issue even

Re: StackOverflowError on high load

2010-02-21 Thread Stu Hood
Ran, There are bounds to how large your data directory will grow, relative to the actual data. Please read up on compaction: http://wiki.apache.org/cassandra/MemtableSSTable , and if you have a significant number of deletes occuring, also read http://wiki.apache.org/cassandra/DistributedDelete

Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Stu Hood
> After I ran "nodeprobe compact" on node B its read latency went up to 150ms. The compaction process can take a while to finish... in 0.5 you need to watch the logs to figure out when it has actually finished, and then you should start seeing the improvement in read latency. > Is there any way

Re: TimeOutExceptions and Cluster Performance

2010-02-13 Thread Stu Hood
The combination of 'too many open files' and lots of memtable flushes could mean you have tons and tons of sstables on disk. This can make reads especially slow. If you are seeing the timeouts on reads a lot more often than on writes, then this explanation might make sense, and you should watch

Re: OOM Exception

2009-12-13 Thread Stu Hood
PS: If this turns out to actually be the problem, I'll open a ticket for it. Thanks, Stu -Original Message- From: "Stu Hood" Sent: Sunday, December 13, 2009 12:28pm To: cassandra-user@incubator.apache.org Subject: Re: OOM Exception With 248G per box, you probably hav

Re: OOM Exception

2009-12-13 Thread Stu Hood
With 248G per box, you probably have slightly more than 1/2 billion items? One current implementation detail in Cassandra is that it loads 128th of the index into memory for faster lookups. This means you might have something like 4.5 million keys in memory at the moment. The '128' value is a c

Re: cassandra over hbase

2009-11-24 Thread Stu Hood
> JR> After chatting with some Facebook guys, we realized that one potential > JR> benefit from using HDFS is that the recovery from losing partial data in a > JR> node is more efficient. Suppose that one lost a single disk at a node. > HDFS > JR> can quickly rebuild the blocks on the failed disk

Re: quorum / hinted handoff

2009-11-20 Thread Stu Hood
You need a quorum relative to your replication factor. You mentioned in the first e-mail that you have RF=2, so you need a quorum of 2. If you use RF=3, then you need a quorum of 2 as well. -Original Message- From: "B. Todd Burruss" Sent: Friday, November 20, 2009 4:14pm To: cassandra-u

Re: bandwidth limiting Cassandra's replication and access control

2009-11-11 Thread Stu Hood
Hey Ted, Would you mind creating a ticket for this issue in JIRA? A lot of discussion has gone on, and a place to collect the design and feedback would be a good start. Thanks, Stu -Original Message- From: "Ted Zlatanov" Sent: Wednesday, November 11, 2009 3:28pm To: cassandra-user@inc

RE: Incr/Decr Counters in Cassandra

2009-11-04 Thread Stu Hood
This type of problem is one of the primary examples of something that should be handled by pluggable/client-side conflict resolution in an eventually consistent system. Currently, all conflicts in Cassandra are handled with "highest timestamp wins" Rather than attempting to add atomic operation

RE: How does Cassandra store data physically?

2009-07-01 Thread Stu Hood
There is no such thing as a column or supercolumn that is not contained in a ColumnFamily. The ColumnFamily is the structure that is stored together on disk. A supercolumn is not what you think it is: supercolumns are like regular columns, except they contain other columns, and you can have an a