Hello, This is my first message in this mailing list, I just subscribed.
I have been using Cassandra for the last few years and now I am trying to create a POC using HBase. Therefore, I am reading the HBase docs but it's been really hard to find how HBase behaves in some situations, when compared to Cassandra. I thought maybe it was a good idea to ask here, as people in this list might know the differences better than anyone else. What I want to do is creating a simple application optimized for writes (not interested in HBase / Cassandra product comparisions here, I am assuming I will use HBase and that's it, just wanna understand the best way of doing it in HBase world). I want to be able to write alerts to the cluster, where each alert would have columns like: - alert id - user id - date/time - alert data Later, I want to search for alerts per user, so my main query could be considered to be something like: Select * from alerts where user_id = $id and date/time > 10 days ago. I want to decide the data model for my application. Here are my questions: - In Cassandra, I would partition by user + day, as some users can have many alerts and some just 1 or a few. In hbase, assuming all alerts for a user would always fit in a single partition / region, can I just use user_id as my row key and assume data will be distributed along the cluster? - Suppose I want to write 100 000 rows from a client machine and these are from 30 000 users. What's the best manner to write these if I want to optimize for writes? Should I batch all 100 k requests in one to a single server? As I am trying to optimize for writes, I would like to split these requests across several nodes instead of sending them all to one. I found this article: http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/ But not sure if it's what I need Thanks in advance! Best regards, Marcelo.