Thanks Skanda for your suggestion. If we go with 24 columns , won't it reduce number of rows over large data set and will improve scan performance. If I want to get daily report, I just have to scan only one row.
Your thoughts? Thanks Pari On 23 June 2015 at 18:57, Skanda <[email protected]> wrote: > Hi Pari > > For your use-case, having it as part of the rowkey should be a better > design than creating so many columns. > > Regards > Skanda > > On Tue, Jun 23, 2015 at 6:49 PM, Pariksheet Barapatre < > [email protected]> wrote: > >> Hello All, >> >> This is more like a HBase question but as I am planning to use Phoenix as >> a access layer, I hope phoenix user will help me. >> >> I would like to create time series data to get on-the-fly analytics. >> >> This use case is for adTech. >> >> Report - what is houly,daily,weekly impression counts at country level >> for a given advertisement ID (ADID). >> >> I am doing hourly aggregation and loading into a Phoenix table. >> >> Primary Key - *ADID | COUNTRY | HOUR ID* >> >> >> --------------------------------------------------------------------------------- >> *ADID | COUNTRY | HOUR ID* | CF.IMP | >> >> --------------------------------------------------------------------------------- >> 1 | US | 2015062301 | 3000 | >> 1 | US | 2015062302 | 3421 | >> 1 | UK | 2015062302 | 1212 | >> >> --------------------------------------------------------------------------------- >> >> Is it a good schema design or shall I create alternate schema as below >> Primary Key - *ADID | COUNTRY | DAY ID* >> >> ---------------------------------------------------------------------------------------------------- >> *ADID | COUNTRY | DAY ID* | CF.IMP*01* | CF.IMP*02* >> | >> >> ---------------------------------------------------------------------------------------------------- >> 1 | US | 20150623 | 3000 | >> 3421 | >> 1 | UK | 20150623 | NULL | >> 1212 | >> >> ---------------------------------------------------------------------------------------------------- >> Here, I have taken hour part from hour ID and created 24 columns. >> >> I gone through many time-series NoSQL blog posts, most the author suggest >> to go with wider rows as above. This will reduce the scan, but I don't see >> much difference in both Data Models in term of latency for scanning. >> >> Can anybody please suggest good approach for my use case? >> >> >> Cheers, >> Pari >> > > -- Cheers, Pari
