Row key would be the following: 

(item id):(series interval day/month/week/etc):(series id #):(date code) 
so like 1000:day:3:20110504 


I pre-split the table into 50 regions before using importTsv to generate the 
hfiles for the initial load, which worked fine. Table is now about 60 gigs. 
Scanning is fast while not writing to the table - generally I'll scan a few 
hundred rows and return those values to the user. But when map/reduce is 
writing to that table it takes about a minute to scan through 10 rows even when 
just doing scan 'table' in the shell. 


Paul Nickerson 

Grooveshark 
Data Scientist 


Phone: 352-538-1962 
----- Original Message -----

From: "Tom" <[email protected]> 
To: [email protected] 
Cc: "Paul Nickerson" <[email protected]> 
Sent: Thursday, February 9, 2012 5:28:45 PM 
Subject: Re: Using hbase for time series 

Hi Paul, 

generally should be possible, others are using it for TS (have a look at 
the schema @ opentsdb.net if you have not done so) . 

What does your row key schema and a typical read access look like (scan 
over many rows / multiple regions ...)? 

Cheers 


On 02/09/2012 02:12 PM, Paul Nickerson wrote: 
> 
> I'm trying to create a time series table that contains a couple of billion 
> rows. It contains daily values for several millions of items. This table will 
> be visible to the outside world, so it should be able to support lots of 
> reads at any point in time. My plan is to every night use map/reduce to batch 
> load the days values for each of the items into that table. The problem seems 
> to be that read performance is dismal while I'm writing data to the table. 
> 
> 
> Is there any way to accomplish what I'm trying to do? Fwiw I'm currently 
> using the hive hbase integration to load data to the hbase table. 
> 
> 
> Thank you, 
> Paul Nickerson 


Reply via email to