Row key would be the following:
(item id):(series interval day/month/week/etc):(series id #):(date code) so like 1000:day:3:20110504 I pre-split the table into 50 regions before using importTsv to generate the hfiles for the initial load, which worked fine. Table is now about 60 gigs. Scanning is fast while not writing to the table - generally I'll scan a few hundred rows and return those values to the user. But when map/reduce is writing to that table it takes about a minute to scan through 10 rows even when just doing scan 'table' in the shell. Paul Nickerson Grooveshark Data Scientist Phone: 352-538-1962 ----- Original Message ----- From: "Tom" <[email protected]> To: [email protected] Cc: "Paul Nickerson" <[email protected]> Sent: Thursday, February 9, 2012 5:28:45 PM Subject: Re: Using hbase for time series Hi Paul, generally should be possible, others are using it for TS (have a look at the schema @ opentsdb.net if you have not done so) . What does your row key schema and a typical read access look like (scan over many rows / multiple regions ...)? Cheers On 02/09/2012 02:12 PM, Paul Nickerson wrote: > > I'm trying to create a time series table that contains a couple of billion > rows. It contains daily values for several millions of items. This table will > be visible to the outside world, so it should be able to support lots of > reads at any point in time. My plan is to every night use map/reduce to batch > load the days values for each of the items into that table. The problem seems > to be that read performance is dismal while I'm writing data to the table. > > > Is there any way to accomplish what I'm trying to do? Fwiw I'm currently > using the hive hbase integration to load data to the hbase table. > > > Thank you, > Paul Nickerson
