Re: Using hbase for time series

Tom Thu, 09 Feb 2012 15:59:18 -0800

So, your map/reduce job is bringing at least one of your region serversto a crawl.

Here are some thoughts to isolate your problem: comment out writing ofthe final results, comment out reading of data by using dummy values,monitor your region servers.



On 02/09/2012 02:49 PM, Paul Nickerson wrote:

Row key would be the following:

(item id):(series interval day/month/week/etc):(series id #):(date code)
so like 1000:day:3:20110504

I pre-split the table into 50 regions before using importTsv to generate
the hfiles for the initial load, which worked fine. Table is now about
60 gigs. Scanning is fast while not writing to the table - generally
I'll scan a few hundred rows and return those values to the user. But
when map/reduce is writing to that table it takes about a minute to scan
through 10 rows even when just doing scan 'table' in the shell.

*Paul Nickerson*
Grooveshark
Data Scientist

Phone: 352-538-1962

------------------------------------------------------------------------
*From: *"Tom" <[email protected]>
*To: *[email protected]
*Cc: *"Paul Nickerson" <[email protected]>
*Sent: *Thursday, February 9, 2012 5:28:45 PM
*Subject: *Re: Using hbase for time series

Hi Paul,

generally should be possible, others are using it for TS (have a look at
the schema @ opentsdb.net if you have not done so) .

What does your row key schema and a typical read access look like (scan
over many rows / multiple regions ...)?

Cheers

On 02/09/2012 02:12 PM, Paul Nickerson wrote:
 >
 > I'm trying to create a time series table that contains a couple of
billion rows. It contains daily values for several millions of items.
This table will be visible to the outside world, so it should be able to
support lots of reads at any point in time. My plan is to every night
use map/reduce to batch load the days values for each of the items into
that table. The problem seems to be that read performance is dismal
while I'm writing data to the table.
 >
 >
 > Is there any way to accomplish what I'm trying to do? Fwiw I'm
currently using the hive hbase integration to load data to the hbase table.
 >
 >
 > Thank you,
 > Paul Nickerson

Re: Using hbase for time series

Reply via email to