To add to Vladimir's comments, I'd suggest using DATE instead of LONG for
(4). This way you'll be able to take advantage of Phoenix's support for
date functions should you need to do so in the future.

Eli

On Mon, Jul 6, 2015 at 11:32 AM, Vladimir Rodionov <vladrodio...@gmail.com>
wrote:

> 1. Unless you do query by Anumber prefix (country code + operator id) -
> reverse it : random 6 + operator id + country code. In this case you will
> not need salting row.
> 2. Presplit  table. Make sure you won't need to split table during normal
> operation.
> 3. Keep index between Bnumber (IMEI, IMSI?) and Anumber. Get Anumber by
> IMEI then run query by Anumber. This index is going to be much smaller.
>
> Phoenix supports any table level configuration options, so you can specify
> TTL in your DDL statement
>
> As for capacity planning, you can read:
>
> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_system-admin-guide/content/ch_hbase_cluster_capacity_region_sizing.html
>
> -Vlad
>
> As for capacity planning, please read HBase book
>
>
> On Mon, Jul 6, 2015 at 8:52 AM, Matjaž Trtnik <m...@salviol.com> wrote:
>
>>  Hi fellow Phoenix users!
>>
>>  We are considering using HBase and Phoenix for CDR data retention.
>> Number of records is around 50 million per day and we should keep them for
>> about one year. I have played around a bit but would like to hear second
>> opinion from people who have more experience so I have few questions:
>>
>>
>>    1. Based on your experience can anyone recommend me approx number of
>>    nodes in cluster and hardware configuration of one node (RAM).
>>    2. Regarding row key I was thinking of Anumber + timestamp + Bnumber
>>    + jobId + recordIndex. Any other ideas? Do I need to use salting or no?
>>    Let’s assume aNumber in most cases start with first 5 digits the same
>>    (country + operator code), followed by 6 random digits for user number.
>>    3. Searches are typically done by Anumber and timestamp but also some
>>    other criterias may apply, like IMEI or IMSI number. Do you suggest to 
>> have
>>    secondary indexes for that? I read that if using secondary index all
>>    columns in select statement should be included in index as well. Keeping 
>> in
>>    mind I’m returning almost all columns does this mean almost double of data
>>    for each index? Any other suggestions how to handle this?
>>    4. For time stamp, do you suggest using LONG and storing epoch time
>>    or stick with DATE format?
>>    5. Another request is that after some time we need to be able to
>>    efficiently delete all CDRs that are older than let’s say 1 year. Is 
>> design
>>    of row key still good for that as only argument here will be timestamp? Is
>>    it possible to use TTL with Phoenix?
>>
>>
>>  Any other suggestions and advices how to design system are more than
>> welcomed.
>>
>>  Thanks, Matjaz
>>
>
>

Reply via email to