You really have to try out both if you want to be sure.

The fundamental differences that come to mind are:
* HBase is always consistent. Machine outages lead to inability to read or 
write data on that machine. With Cassandra you can always write.

* Cassandra defaults to a random partitioner, so range scans are not possible 
(by default)
* HBase has a range partitioner (if you don't want that the client has to 
prefix the rowkey with a prefix of a hash of the rowkey). The main feature that 
set HBase apart are range scans.

* HBase is much more tightly integrated with Hadoop/MapReduce/HDFS, etc. You 
can map reduce directly into HFiles and map those into HBase instantly.

* Cassandra has a dedicated company supporting (and promoting) it.
* Getting started is easier with Cassandra. For HBase you need to run HDFS and 
Zookeeper, etc.
* I've heard lots of anecdotes about Cassandra working nicely with small 
cluster (< 50 nodes) and quick degenerating above that.
* HBase does not have a query language (but you can use Phoenix for full SQL 
support)
* HBase does not have secondary indexes (having an eventually consistent index, 
similar to what Cassandra has, is easy in HBase, but making it as consistent as 
the rest of HBase is hard)

* Everything you'll hear here is biased :)



>From personal experience... At Salesforce we spent a few months prototyping 
>various stores (including Cassandra) and arrived at HBase. Your mileage may 
>vary.


-- Lars


----- Original Message -----
From: Ajay <ajay.ga...@gmail.com>
To: user@hbase.apache.org
Cc: 
Sent: Friday, May 29, 2015 12:12 PM
Subject: Hbase vs Cassandra

Hi,

I need some info on Hbase vs Cassandra as a data store (in general plus
specific to time series data).

The comparison in the following helps:
1: features
2: deployment and monitoring
3: performance
4: anything else

Thanks
Ajay

Reply via email to