Thanks Skanda for your suggestion. If we go with 24 columns , won't it
reduce number of rows over large data set and will improve scan
performance. If I want to get daily report, I just have to scan only one
row.

Your thoughts?

Thanks
Pari

On 23 June 2015 at 18:57, Skanda <[email protected]> wrote:

> Hi Pari
>
> For your use-case, having it as part of the rowkey should be a better
> design than creating so many columns.
>
> Regards
> Skanda
>
> On Tue, Jun 23, 2015 at 6:49 PM, Pariksheet Barapatre <
> [email protected]> wrote:
>
>> Hello All,
>>
>> This is more like a HBase question but as I am planning to use Phoenix as
>> a access layer, I hope phoenix user will help me.
>>
>> I would like to create time series data to get on-the-fly analytics.
>>
>> This use case is for adTech.
>>
>> Report - what is houly,daily,weekly impression counts at country level
>> for a given advertisement ID (ADID).
>>
>> I am doing hourly aggregation and loading into a Phoenix table.
>>
>> Primary Key - *ADID          | COUNTRY       | HOUR ID*
>>
>>
>> ---------------------------------------------------------------------------------
>> *ADID          | COUNTRY       | HOUR ID*      |  CF.IMP  |
>>
>> ---------------------------------------------------------------------------------
>> 1                | US                  | 2015062301  | 3000        |
>> 1                | US                  | 2015062302  | 3421        |
>> 1                | UK                  | 2015062302  | 1212        |
>>
>> ---------------------------------------------------------------------------------
>>
>> Is it a good schema design or shall I create alternate schema as below
>> Primary Key - *ADID          | COUNTRY       | DAY ID*
>>
>> ----------------------------------------------------------------------------------------------------
>> *ADID          | COUNTRY       | DAY ID*      |  CF.IMP*01*  | CF.IMP*02*
>> |
>>
>> ----------------------------------------------------------------------------------------------------
>> 1                | US                  | 20150623  | 3000        |
>> 3421         |
>> 1                | UK                  | 20150623  | NULL        |
>> 1212          |
>>
>> ----------------------------------------------------------------------------------------------------
>> Here, I have taken hour part from hour ID and created 24 columns.
>>
>> I gone through many time-series NoSQL blog posts, most the author suggest
>> to go with wider rows as above. This will reduce the scan, but I don't see
>> much difference in both Data Models in term of latency for scanning.
>>
>> Can anybody please suggest good approach for my use case?
>>
>>
>> Cheers,
>> Pari
>>
>
>


-- 
Cheers,
Pari

Reply via email to