Re: Distributed Data structure

2011-08-29 Thread Jean-Daniel Cryans
somuch interest after looking > at the way it got evolved and started slowly working on it. I want to > develop a distributed data structure in HBase environment like Distributed > Hash Table , distributed set, or list. Can somebody help me in this regard. > Even if anyone worked or ha

Distributed Data structure

2011-08-27 Thread vamshi krishna
Hi folks, i am new to Hbase and recently got somuch interest after looking at the way it got evolved and started slowly working on it. I want to develop a distributed data structure in HBase environment like Distributed Hash Table , distributed set, or list. Can somebody help me in this

Re: data structure

2011-07-29 Thread Otis Gospodnetic
ematext.com/ :: Solr - Lucene - Hadoop - HBase Hadoop ecosystem search :: http://search-hadoop.com/ > >From: Andre Reiter >To: user@hbase.apache.org >Sent: Thursday, July 14, 2011 3:52 PM >Subject: data structure > >Hi everybody, > >we

Re: data structure

2011-07-17 Thread Ted Dunning
Averages are easy to rollup as well. Rank statistics like median, min, max and quartiles are not much harder. Total uniques are more difficult. If you have decent distributional information, these can be estimated reasonably well. Mahout has code for the first two. On Sun, Jul 17, 2011 at 9:30

Re: data structure

2011-07-17 Thread Arvind Jayaprakash
On Jul 14, Andre Reiter wrote: >new we are running mapreduce jobs, to generate a report: for example we >want to know how many impressions were done by all users in last x >days. therefore the scan of the MR job is running over all data in our >hbase table for the particular family. this takes at t

Re: data structure

2011-07-15 Thread Claudio Martella
gt;> - Original Message - >> From: Claudio Martella >> Sent: Fri Jul 15 2011 14:40:38 GMT+0200 (CET) >> To: >> CC: >> Subject: Re: data structure > >> supposed you want a per-hour granularity, you could have a key like this >> >> _ >

Re: data structure

2011-07-15 Thread Andre Reiter
hi Claudio, thanks for the hint the point is, that we need a fast request to the user data, that is why we need the row key to be the user_id - Original Message - From: Claudio Martella Sent: Fri Jul 15 2011 14:40:38 GMT+0200 (CET) To: CC: Subject: Re: data structure supposed

Re: data structure

2011-07-15 Thread Claudio Martella
>> Sent: Thu Jul 14 2011 23:17:20 GMT+0200 (CET) >> To: >> CC: >> Subject: Re: data structure > >> You can play tricks with the arrangement of the key. >> >> For instance, you can put date at the end of the key. That would let >> you >> p

Re: data structure

2011-07-14 Thread Ted Dunning
: > - Original Message - >> From: Ted Dunning >> Sent: Thu Jul 14 2011 23:17:20 GMT+0200 (CET) >> To: >> CC: >> Subject: Re: data structure >> > > You can play tricks with the arrangement of the key. >> >> For instance, you can pu

Re: data structure

2011-07-14 Thread Andre Reiter
- Original Message - From: Ted Dunning Sent: Thu Jul 14 2011 23:17:20 GMT+0200 (CET) To: CC: Subject: Re: data structure You can play tricks with the arrangement of the key. For instance, you can put date at the end of the key. That would let you pull data for a particular user for

Re: data structure

2011-07-14 Thread Ted Dunning
You can play tricks with the arrangement of the key. For instance, you can put date at the end of the key. That would let you pull data for a particular user for a particular date range. The date should not be a time stamp, but should be a low-res version of time (day-level resolution might be o

Re: data structure

2011-07-14 Thread Andre Reiter
- Original Message - From: Doug Meil Sent: Thu Jul 14 2011 22:29:16 GMT+0200 (CET) To: CC: Subject: Re: data structure Hi there- A few high-level suggestions... re: "to generate a report: for example we want to know how many impressions were done by all users in last x days&

Re: data structure

2011-07-14 Thread Andre Reiter
Stack wrote: On Thu, Jul 14, 2011 at 12:52 PM, Andre Reiter wrote: Why is 70 seconds too long for a report? 70 seconds seems like a short mapreduce job (to me). You don't have that many regions. How fast would you like this operation to complete in? The report you describe above is predicat

Re: data structure

2011-07-14 Thread Doug Meil
Hi there- A few high-level suggestions... re: "to generate a report: for example we want to know how many impressions were done by all users in last x days" Can you create a summary table by day (via MR job), and then have your ad-hoc report hit the summary table? Re: "and with the data grow

Re: data structure

2011-07-14 Thread Stack
On Thu, Jul 14, 2011 at 12:52 PM, Andre Reiter wrote: > new we are running mapreduce jobs, to generate a report: for example we want > to know how many impressions were done by all users in last x days. > therefore the scan of the MR job is running over all data in our hbase table > for the partic

data structure

2011-07-14 Thread Andre Reiter
Hi everybody, we have our hadoop + hbase cluster running at the moment with 6 servers everything is working just fine. We have a web application, where data is stored with the row key = user id (meaningless UUID). So our users have a cookie, which is the row key, behind this key are families w