Re: Schema Design Question

2013-05-02 Thread Cameron Gandevia
HDFS > is far better. > > -- Lars > > > > > From: Michel Segel > To: "user@hbase.apache.org" > Cc: "user@hbase.apache.org" > Sent: Monday, April 29, 2013 6:52 AM > Subject: Re: Schema Design Question > > > I would have to

Re: Schema Design Question

2013-04-30 Thread lars hofhansl
. -- Lars From: Michel Segel To: "user@hbase.apache.org" Cc: "user@hbase.apache.org" Sent: Monday, April 29, 2013 6:52 AM Subject: Re: Schema Design Question I would have to agree. The use case doesn't make much sense for HB

Re: Schema Design Question

2013-04-29 Thread Michel Segel
I would have to agree. The use case doesn't make much sense for HBase and sounds a bit more like a problem for Hive. The OP indicated that the data was disposable after a round of processing. IMHO Hive is a better fit. Sent from a remote device. Please excuse any typos... Mike Segel On Apr

Re: Schema Design Question

2013-04-28 Thread Asaf Mesika
I actually don't see the benefit of saving the data into HBase if all you do is read per job id and purges it. Why not accumulate into HDFS per job id and then dump the file? The way I see it, HBase is good for querying parts of your data, even if it is only 10 rows. In your case your average is 1

Re: Schema Design Question

2013-04-26 Thread Enis Söztutar
Hi, Interesting use case. I think it depends on job many jobId's you expect to have. If it is on the order of thousands, I would caution against going the one table per jobid approach, since for every table, there is some master overhead, as well as file structures in hdfs. If jobId's are managabl

Re: Schema Design Question

2013-04-26 Thread Ted Yu
My understanding of your use case is that data for different jobIds would be continuously loaded into the underlying table(s). Looks like you can have one table per job. This way you drop the table after map reduce is complete. In the single table approach, you would delete many rows in the table

Schema Design Question

2013-04-26 Thread Cameron Gandevia
Hi I am new to HBase, I have been trying to POC an application and have a design questions. Currently we have a single table with the following key design jobId_batchId_bundleId_uniquefileId This is an offline processing system so data would be bulk loaded into HBase via map/reduce jobs. We onl

Re: Schema design question - Hot Key concerns

2011-11-20 Thread Michel Segel
Hi, OK... First a caveat... I haven't seen your initial normalized schema, so take what I say with a grain of salt... The problem you are trying to solve is one which can be solved better on an RDBMS platform and does not fit well in a NoSQL space. Your scalability issue would probably be bet

Re: Schema design question - Hot Key concerns

2011-11-18 Thread Suraj Varma
're probably going to want to split your data in to two different > tables and then write some ACID compliance at your APP level. > > Just a quick thought before I pop out for lunch... > > >> Date: Fri, 18 Nov 2011 10:02:54 -0800 >> Subject: Re: Schema design quest

RE: Schema design question - Hot Key concerns

2011-11-18 Thread Michael Segel
write some ACID compliance at your APP level. Just a quick thought before I pop out for lunch... > Date: Fri, 18 Nov 2011 10:02:54 -0800 > Subject: Re: Schema design question - Hot Key concerns > From: selek...@yahoo.com > To: user@hbase.apache.org > > One of the concerns I

Re: Schema design question - Hot Key concerns

2011-11-18 Thread Sam Seigal
AM, Suraj Varma wrote: > I have an HBase schema design question that I wanted to discuss with the list. > > Let's say we have a "wide" table design that has a table with one > column family containing "show bookings", say. > > RowKey: SHOW_ID > Columns:

Schema design question - Hot Key concerns

2011-11-18 Thread Suraj Varma
I have an HBase schema design question that I wanted to discuss with the list. Let's say we have a "wide" table design that has a table with one column family containing "show bookings", say. RowKey: SHOW_ID Columns: SEATS_AVAILABLE, BOOKING_<#1>, BOOKING

Re: Schema design question

2011-04-18 Thread Ted Dunning
I think that your mileage will definitely vary on this point. Your design may work very well. Or not. I would worry just a bit if your data points are large enough to create a really massive row (greater than about a megabyte). On Sun, Apr 17, 2011 at 11:48 PM, Yves Langisch wrote: > So I wond

Re: Schema design question

2011-04-17 Thread Yves Langisch
Yes, you're right. They have a row for each 10 minute period. Inside a row they work with offsets in seconds within this 10 minute period. This leads to a maximum of 10*60 columns per row. Normally you have less columns as you don't have a datapoint for each second. So I wonder if the query per

Re: Schema design question

2011-04-16 Thread Ted Dunning
TsDB has more columns than it appears at first glance. They store all of the observations for a relatively long time interval in a single row. You may have spotted that right off (I didn't). On Sat, Apr 16, 2011 at 1:27 AM, Yves Langisch wrote: > As I'm about to plan a similar app I have studi

Schema design question

2011-04-16 Thread Yves Langisch
As I'm about to plan a similar app I have studied the HBase schema of the opentsb project: http://opentsdb.net/schema.html The opentsb approach seems to have many rows instead of many columns. What is the better schema design in terms of query performance? My experience so far is that a width

RE: Hbase schema design question for time based data

2010-06-17 Thread Sharma, Avani
Ignore... found it in the API doc. Thanks ! -Original Message- From: Sharma, Avani [mailto:agsha...@ebay.com] Sent: Thursday, June 17, 2010 5:28 PM To: user@hbase.apache.org Subject: RE: Hbase schema design question for time based data Is timeT a timestamp or a date ? I am guessing

RE: Hbase schema design question for time based data

2010-06-17 Thread Sharma, Avani
@hbase.apache.org Subject: RE: Hbase schema design question for time based data I'm not terribly familiar with the shell API and it does not fully cover the Java API (I don't think). Let's say I want the 3 latest versions of rowX, columnY that occur before timeT. With the Java API you can do

RE: Hbase schema design question for time based data

2010-06-17 Thread Jonathan Gray
rsions(3) That means, I want versions in the range from 0 to timeT (before timeT), and I only want the 3 latest versions. JG > -Original Message- > From: Sharma, Avani [mailto:agsha...@ebay.com] > Sent: Wednesday, June 16, 2010 4:22 PM > To: user@hbase.apache.org > Su

RE: Hbase schema design question for time based data

2010-06-16 Thread Sharma, Avani
r after date X". How can I do this on hbase shell as well as API ? Say I want the latest version before a certain date? -Avani -Original Message- From: Jonathan Gray [mailto:jg...@facebook.com] Sent: Wednesday, June 16, 2010 11:40 AM To: user@hbase.apache.org Subject: RE: Hbase schema

RE: Hbase schema design question for time based data

2010-06-16 Thread Sharma, Avani
itten using new API for reference. -Original Message- From: Jonathan Gray [mailto:jg...@facebook.com] Sent: Wednesday, June 16, 2010 11:40 AM To: user@hbase.apache.org Subject: RE: Hbase schema design question for time based data > Hi, > > I am trying design schema for some da

RE: Hbase schema design question for time based data

2010-06-16 Thread Jonathan Gray
> Hi, > > I am trying design schema for some data to be moved from HDFS into > HBase for real-time access. > Questions - > > 1. Is the use of new API for bulk upload recommended over old API? If > yes, is the new API stable and is there sample executable code around ? Not sure if there is much s

Hbase schema design question for time based data

2010-06-15 Thread Sharma, Avani
Hi, I am trying design schema for some data to be moved from HDFS into HBase for real-time access. Questions - 1. Is the use of new API for bulk upload recommended over old API? If yes, is the new API stable and is there sample executable code around ? 2. The data is over time. I need to be ab