Yes , I agree that using only timestamp it will cause hotspot. I can create prespliting for regions. I saw TSDB video and presentation and their data model. I think this is not suitable for my case.
I looked thru google alot and for my surprise there is any post about such clasic problem. It is very strange. I try to group timeseries not like most solutions provides -- every 1h , 1day , 5 minutes. it is simple. I need to group element relatively to itself by time: I mean I have {event1: 10:05} and I want to group it with elements which was after 10:05 during time X. in case X=7 minutes all events between 10:05 and 10:12 will be in the group. It is like a join of each row with all other rows , but the performance will be very bad. Currently I have 50Millon events => so it will be 50Million^2. That is why I don't want to use pure map/reduce. I want to use hbase as output of map/reduce and model data in a such way I described above. So is there a way to model data in such tipe of time buckets? Please advice. Thanks Oleg. On Mon, Jan 28, 2013 at 5:54 PM, Michel Segel <michael_se...@hotmail.com>wrote: > Tough one in that if your events are keyed on time alone, you will hit a > hot spot on write. Reads,not so much... > > TSDB would be a good start ... > > You may not need 'buckets' but just a time stamp and set up a start and > stop key values. > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On Jan 28, 2013, at 7:06 AM, Oleg Ruchovets <oruchov...@gmail.com> wrote: > > > Hi , > > > > I have such row data structure: > > > > event_id | time > > ============= > > event1 | 10:07 > > event2 | 10:10 > > event3 | 10:12 > > > > event4 | 10:20 > > event5 | 10:23 > > event6 | 10:25 > > > > > > Numbers of records is 50-100 million. > > > > > > Question: > > > > I need to find group of events starting form eventX and enters to the > time > > window bucket = T. > > > > > > For example: if T=7 munutes. > > Starting from event event1- {event1, event2 , event3} were detected > durint > > 7 minutes. > > > > Starting from event event2- {event2 , event3} were detected durint 7 > > minutes. > > > > Starting from event event4 - {event4, event5 , event6} were detected > during > > 7 minutes. > > Is there a way to model the data in hbase to get? > > > > Thanks >