Yes, you are correct, event3 never emits for the time "10:07".
The proper result table is, as you mention:
event1 | event2
event2 | event3
event3 |
I guess i was thinking about the old example(T=7). :)
On Thu, Jan 31, 2013 at 12:39 PM, Oleg Ruchovets wrote:
> Hi Rodrigo
Hi Rodrigo ,
That is just GREAT Idea :-) !!!
But how did you get a final result:
event1 | event2, event3
event2 | event3
event3 |
I tried to simulate and didn't get event1| event2,event3
(10:03, [*after*, event1])
(10:04, [*after*, event1])
(10:05, [*after*
Hi,
The Map and Reduce steps that you mention is the same as how i though.
How should I work with this table.Should I have to scan Main table : row by
> row and for every row get event time and based on that time query second
> table?
>
> In case I will do so , i still need to execute 50 milli
Hi Rodrigo ,
As usual you have very intereting ! :-)
I am not sure that I understand exactly what do you mean and I try to
simulate:
Suppose we have such events in MAIN Table:
event1 | 10:07
event2 | 10:10
event3 | 10:12
Time window T=5 minutes.
===
There is another option,
You could do a MapReduce job that, for each row from the main table, emits
all times that it would be in the window of time,
For example, "event1" would emit {"10:06": event1}, {"10:05": event1} ...
{"10:00": event1}. (also for "10:07" if you want to know those who happen
i
Hi Rodrigo.
Using solution with 2 tables : one main and one as index.
I have ~50 Million records , in my case I need scan all table and as a
result I will have 50 Millions scans and It will kill all performance.
Is there any other approach to model my usecase using hbase?
Thanks
Oleg.
On Mo
I think I didn't explain correct.
I want to read from 2 table in context of 1 mapreduce job.
I mean I want to read one key from main table and scan range from another
in the same mapreduce job.I only found MultiTableOutputFormat and there is
no MultiTableInputFormat. Is there any workaround to
Yes, it's possible,
Check this solution:
http://stackoverflow.com/questions/11353911/extending-hadoops-tableinputformat-to-scan-with-a-prefix-used-for-distribution
On Mon, Jan 28, 2013 at 2:07 PM, Oleg Ruchovets wrote:
> Yes.
> This is very interesting approach.
>
>Is it possible to read
Yes.
This is very interesting approach.
Is it possible to read from main key and scan from another using
map/reduce? I don't want to read from single client. I use hbase version
0.94.2.21.
Thanks
Oleg.
On Mon, Jan 28, 2013 at 6:27 PM, Rodrigo Ribeiro <
rodrigui...@jusbrasil.com.br> wrot
In the approach that i mentioned, you would need a table to retrieve the
time of a certain event(if this information can retrieve in another way,
you may ignore this table). It would be like you posted:
event_id | time
=
event1 | 10:07
event2 | 10:10
event3 | 10:12
event4 | 10:20
And a
Yes ,
I agree that using only timestamp it will cause hotspot. I can create
prespliting for regions.
I saw TSDB video and presentation and their data model. I think this is not
suitable for my case.
I looked thru google alot and for my surprise there is any post about such
clasic problem. It
Tough one in that if your events are keyed on time alone, you will hit a hot
spot on write. Reads,not so much...
TSDB would be a good start ...
You may not need 'buckets' but just a time stamp and set up a start and stop
key values.
Sent from a remote device. Please excuse any typos...
Mike
Hi Rodrigo.
Can you please explain in more details your solution.You said that I will
have another table. How many table will I have? Will I have 2 tables? What
will be the schema of the tables?
I try to explain what I try to achive:
I have ~50 million records like {time|event}. I want to pu
You can use another table as a index, using a rowkey like
'{time}:{event_id}', and then scan in the range ["10:07", "10:15").
On Mon, Jan 28, 2013 at 10:06 AM, Oleg Ruchovets wrote:
> Hi ,
>
> I have such row data structure:
>
> event_id | time
> =
> event1 | 10:07
> event2 | 10:10
>
Hi ,
I have such row data structure:
event_id | time
=
event1 | 10:07
event2 | 10:10
event3 | 10:12
event4 | 10:20
event5 | 10:23
event6 | 10:25
Numbers of records is 50-100 million.
Question:
I need to find group of events starting form eventX and enters to the time
window buck
15 matches
Mail list logo