somuch interest after looking
> at the way it got evolved and started slowly working on it. I want to
> develop a distributed data structure in HBase environment like Distributed
> Hash Table , distributed set, or list. Can somebody help me in this regard.
> Even if anyone worked or ha
Hi folks,
i am new to Hbase and recently got somuch interest after looking
at the way it got evolved and started slowly working on it. I want to
develop a distributed data structure in HBase environment like Distributed
Hash Table , distributed set, or list. Can somebody help me in this
ematext.com/ :: Solr - Lucene - Hadoop - HBase
Hadoop ecosystem search :: http://search-hadoop.com/
>
>From: Andre Reiter
>To: user@hbase.apache.org
>Sent: Thursday, July 14, 2011 3:52 PM
>Subject: data structure
>
>Hi everybody,
>
>we
Averages are easy to rollup as well.
Rank statistics like median, min, max and quartiles are not much harder.
Total uniques are more difficult. If you have decent distributional
information, these can be estimated reasonably well.
Mahout has code for the first two.
On Sun, Jul 17, 2011 at 9:30
On Jul 14, Andre Reiter wrote:
>new we are running mapreduce jobs, to generate a report: for example we
>want to know how many impressions were done by all users in last x
>days. therefore the scan of the MR job is running over all data in our
>hbase table for the particular family. this takes at t
gt;> - Original Message -
>> From: Claudio Martella
>> Sent: Fri Jul 15 2011 14:40:38 GMT+0200 (CET)
>> To:
>> CC:
>> Subject: Re: data structure
>
>> supposed you want a per-hour granularity, you could have a key like this
>>
>> _
>
hi Claudio,
thanks for the hint
the point is, that we need a fast request to the user data, that is why we need
the row key to be the user_id
- Original Message -
From: Claudio Martella
Sent: Fri Jul 15 2011 14:40:38 GMT+0200 (CET)
To:
CC:
Subject: Re: data structure
supposed
>> Sent: Thu Jul 14 2011 23:17:20 GMT+0200 (CET)
>> To:
>> CC:
>> Subject: Re: data structure
>
>> You can play tricks with the arrangement of the key.
>>
>> For instance, you can put date at the end of the key. That would let
>> you
>> p
:
> - Original Message -
>> From: Ted Dunning
>> Sent: Thu Jul 14 2011 23:17:20 GMT+0200 (CET)
>> To:
>> CC:
>> Subject: Re: data structure
>>
>
> You can play tricks with the arrangement of the key.
>>
>> For instance, you can pu
- Original Message -
From: Ted Dunning
Sent: Thu Jul 14 2011 23:17:20 GMT+0200 (CET)
To:
CC:
Subject: Re: data structure
You can play tricks with the arrangement of the key.
For instance, you can put date at the end of the key. That would let you
pull data for a particular user for
You can play tricks with the arrangement of the key.
For instance, you can put date at the end of the key. That would let you
pull data for a particular user for a particular date range. The date
should not be a time stamp, but should be a low-res version of time
(day-level resolution might be o
- Original Message -
From: Doug Meil
Sent: Thu Jul 14 2011 22:29:16 GMT+0200 (CET)
To:
CC:
Subject: Re: data structure
Hi there-
A few high-level suggestions...
re: "to generate a report: for example we want to know how many
impressions were done by all users in last x days&
Stack wrote:
On Thu, Jul 14, 2011 at 12:52 PM, Andre Reiter wrote:
Why is 70 seconds too long for a report? 70 seconds seems like a
short mapreduce job (to me).
You don't have that many regions.
How fast would you like this operation to complete in?
The report you describe above is predicat
Hi there-
A few high-level suggestions...
re: "to generate a report: for example we want to know how many
impressions were done by all users in last x days"
Can you create a summary table by day (via MR job), and then have your
ad-hoc report hit the summary table?
Re: "and with the data grow
On Thu, Jul 14, 2011 at 12:52 PM, Andre Reiter wrote:
> new we are running mapreduce jobs, to generate a report: for example we want
> to know how many impressions were done by all users in last x days.
> therefore the scan of the MR job is running over all data in our hbase table
> for the partic
Hi everybody,
we have our hadoop + hbase cluster running at the moment with 6 servers
everything is working just fine. We have a web application, where data is stored with the
row key = user id (meaningless UUID). So our users have a cookie, which is the row key,
behind this key are families w
16 matches
Mail list logo