HDFS
> is far better.
>
> -- Lars
>
>
>
>
> From: Michel Segel
> To: "user@hbase.apache.org"
> Cc: "user@hbase.apache.org"
> Sent: Monday, April 29, 2013 6:52 AM
> Subject: Re: Schema Design Question
>
>
> I would have to
.
-- Lars
From: Michel Segel
To: "user@hbase.apache.org"
Cc: "user@hbase.apache.org"
Sent: Monday, April 29, 2013 6:52 AM
Subject: Re: Schema Design Question
I would have to agree.
The use case doesn't make much sense for HB
I would have to agree.
The use case doesn't make much sense for HBase and sounds a bit more like a
problem for Hive.
The OP indicated that the data was disposable after a round of processing.
IMHO Hive is a better fit.
Sent from a remote device. Please excuse any typos...
Mike Segel
On Apr
I actually don't see the benefit of saving the data into HBase if all you
do is read per job id and purges it. Why not accumulate into HDFS per job
id and then dump the file? The way I see it, HBase is good for querying
parts of your data, even if it is only 10 rows. In your case your average
is 1
Hi,
Interesting use case. I think it depends on job many jobId's you expect to
have. If it is on the order of thousands, I would caution against going the
one table per jobid approach, since for every table, there is some master
overhead, as well as file structures in hdfs. If jobId's are managabl
My understanding of your use case is that data for different jobIds would
be continuously loaded into the underlying table(s).
Looks like you can have one table per job. This way you drop the table
after map reduce is complete. In the single table approach, you would
delete many rows in the table
Hi
I am new to HBase, I have been trying to POC an application and have a
design questions.
Currently we have a single table with the following key design
jobId_batchId_bundleId_uniquefileId
This is an offline processing system so data would be bulk loaded into
HBase via map/reduce jobs. We onl
Hi,
OK...
First a caveat... I haven't seen your initial normalized schema, so take what I
say with a grain of salt...
The problem you are trying to solve is one which can be solved better on an
RDBMS platform and does not fit well in a NoSQL space.
Your scalability issue would probably be bet
're probably going to want to split your data in to two different
> tables and then write some ACID compliance at your APP level.
>
> Just a quick thought before I pop out for lunch...
>
>
>> Date: Fri, 18 Nov 2011 10:02:54 -0800
>> Subject: Re: Schema design quest
write some ACID compliance at your APP level.
Just a quick thought before I pop out for lunch...
> Date: Fri, 18 Nov 2011 10:02:54 -0800
> Subject: Re: Schema design question - Hot Key concerns
> From: selek...@yahoo.com
> To: user@hbase.apache.org
>
> One of the concerns I
AM, Suraj Varma wrote:
> I have an HBase schema design question that I wanted to discuss with the list.
>
> Let's say we have a "wide" table design that has a table with one
> column family containing "show bookings", say.
>
> RowKey: SHOW_ID
> Columns:
I have an HBase schema design question that I wanted to discuss with the list.
Let's say we have a "wide" table design that has a table with one
column family containing "show bookings", say.
RowKey: SHOW_ID
Columns: SEATS_AVAILABLE, BOOKING_<#1>, BOOKING
I think that your mileage will definitely vary on this point. Your
design may work very well. Or not. I would worry just a bit if your
data points are large enough to create a really massive row (greater
than about a megabyte).
On Sun, Apr 17, 2011 at 11:48 PM, Yves Langisch wrote:
> So I wond
Yes, you're right. They have a row for each 10 minute period. Inside a row they
work with offsets in seconds within this 10 minute period. This leads to a
maximum of 10*60 columns per row. Normally you have less columns as you don't
have a datapoint for each second.
So I wonder if the query per
TsDB has more columns than it appears at first glance. They store all of
the observations for a relatively long time interval in a single row.
You may have spotted that right off (I didn't).
On Sat, Apr 16, 2011 at 1:27 AM, Yves Langisch wrote:
> As I'm about to plan a similar app I have studi
As I'm about to plan a similar app I have studied the HBase schema of the
opentsb project:
http://opentsdb.net/schema.html
The opentsb approach seems to have many rows instead of many columns. What is
the better schema design in terms of query performance? My experience so far is
that a width
Ignore... found it in the API doc. Thanks !
-Original Message-
From: Sharma, Avani [mailto:agsha...@ebay.com]
Sent: Thursday, June 17, 2010 5:28 PM
To: user@hbase.apache.org
Subject: RE: Hbase schema design question for time based data
Is timeT a timestamp or a date ?
I am guessing
@hbase.apache.org
Subject: RE: Hbase schema design question for time based data
I'm not terribly familiar with the shell API and it does not fully cover the
Java API (I don't think).
Let's say I want the 3 latest versions of rowX, columnY that occur before timeT.
With the Java API you can do
rsions(3)
That means, I want versions in the range from 0 to timeT (before timeT), and I
only want the 3 latest versions.
JG
> -Original Message-
> From: Sharma, Avani [mailto:agsha...@ebay.com]
> Sent: Wednesday, June 16, 2010 4:22 PM
> To: user@hbase.apache.org
> Su
r after date X".
How can I do this on hbase shell as well as API ? Say I want the latest version
before a certain date?
-Avani
-Original Message-
From: Jonathan Gray [mailto:jg...@facebook.com]
Sent: Wednesday, June 16, 2010 11:40 AM
To: user@hbase.apache.org
Subject: RE: Hbase schema
itten using new API for reference.
-Original Message-
From: Jonathan Gray [mailto:jg...@facebook.com]
Sent: Wednesday, June 16, 2010 11:40 AM
To: user@hbase.apache.org
Subject: RE: Hbase schema design question for time based data
> Hi,
>
> I am trying design schema for some da
> Hi,
>
> I am trying design schema for some data to be moved from HDFS into
> HBase for real-time access.
> Questions -
>
> 1. Is the use of new API for bulk upload recommended over old API? If
> yes, is the new API stable and is there sample executable code around ?
Not sure if there is much s
Hi,
I am trying design schema for some data to be moved from HDFS into HBase for
real-time access.
Questions -
1. Is the use of new API for bulk upload recommended over old API? If yes, is
the new API stable and is there sample executable code around ?
2. The data is over time. I need to be ab
23 matches
Mail list logo