Hello,
This is my first message in this mailing list, I just subscribed.
I have been using Cassandra for the last few years and now I am trying to
create a POC using HBase. Therefore, I am reading the HBase docs but it's been
really hard to find how HBase behaves in some situations, when comp
You can use a key like (user_id + timestamp + alert_id) to get
clustering of rows related to a user. To get better write throughput
and distribution over the cluster, you could pre-split the table and
use a consistent hash of the user_id as a row key prefix.
Have you looked at the rowkey design se
now what's the limit for HBase?
Best regards,
Marcelo Valle.
From: aloksi...@gmail.com
Subject: Re: data partitioning and data model
You can use a key like (user_id + timestamp + alert_id) to get
clustering of rows related to a user. To get better write throughput
and distribution over the
= X and timestamp > T and (alert_id = id1
> or alert_id = id2)
>
> Would I be able to do the same query using user_id + timestamp + alert_id as
> row key?
>
> Also, I know Cassandra supports up to 2 billion columns per row (2 billion
> rows per partition in CQL), do you
.
Does this happen also when using pre-loading?
In the case of a rebalance, if I try to WRITE data to a record being
rebalanced, would the write performance be affected?
Best regards,
Marcelo Valle.
From: user@hbase.apache.org
Subject: Re: data partitioning and data model
You don't want a l
I am sorry, consider I am using auto pre-splitting for question bellow.
From: user@hbase.apache.org
Subject: Re: data partitioning and data model
Thanks Alok,
I will take a good look at the link for sure.
Just an additional question, I saw, reading this:
http://stackoverflow.com/questions
d1
>> or alert_id = id2)
>>
>> Would I be able to do the same query using user_id + timestamp + alert_id as
>> row key?
>>
>> Also, I know Cassandra supports up to 2 billion columns per row (2 billion
>> rows per partition in CQL), do you know what's
o keep data almost evenly distributed on every
partition, I might end up having the increase in read/write latency when data
is moving from a region to the other, although this could be rare, is this
right?
From: user@hbase.apache.org
Subject: Re: data partitioning and data model
Assuming the clust
key like we
> described in this thread to keep data almost evenly distributed on every
> partition, I might end up having the increase in read/write latency when data
> is moving from a region to the other, although this could be rare, is this
> right?
>
> From: user@hba
Thanks a lot!
From: aloksi...@gmail.com
Subject: Re: data partitioning and data model
I meant, in the normal course of operation, rebalancing will not
affect writes in flight. This is never an issue when pre splitting
because, by definition, splits occurred before data was written to the
Hi,
Yes you would want to start your key by user_id.
But you don’t need the timestamp. The user_id + alert_id should be enough on
the key.
If you want to get fancy…
If your alert_id is not a number, you could use the EPOCH - Timestamp as a way
to invert the order of the alerts so that the la
>> Would I be able to do a query like the one bellow?
>> Select * from table where user_id = X and timestamp > T and (alert_id = id1
>> or alert_id = id2)
>>
>> Would I be able to do the same query using user_id + timestamp + alert_id as
>> row key?
>>
12 matches
Mail list logo