That is an interesting comment. How would you enforce this in practice
? Can you give more details.
On Wed, Dec 14, 2011 at 10:29 AM, Carson Hoffacker wrote:
> The timerange scan is able to leverage metadata in each of the HFiles. Each
> HFile should store information about the timerange associat
one or more than one table is the same work:
>>create the tables (one by one) with the list of split points.
>>
>>Lars
>>
>>On Dec 1, 2011, at 7:50 AM, Sam Seigal wrote:
>>
>>> HI,
>>>
>>> I had a question about the relationship bet
HI,
I had a question about the relationship between regions and tables.
Is there a way to pre-create regions for multiple tables ? or each
table has its own set of regions managed independently ?
I read on one of the threads that there is really no limit on the
number of tables, but that we nee
What about "partitioning" at a table level. For example, create 12
tables for the given year. Design the row keys however you like, let's
say using SHA/MD hashes. Place transactions in the appropriate table
and then do aggregations based on that table alone (this is assuming
you won't get transacti
Is there any concerns in applying the SNAPPY patch @
https://issues.apache.org/jira/browse/HBASE-3691 to 0.90.3 ?
2011/11/25 Gaojinchao :
> You can search maillist about topic "Snappy for 0.90.4".
>
>
> -邮件原件-
> 发件人: saurabh@gmail.com [mailto:saurabh@gmail
Hi,
The Compression.Algorithm enum does not have "SNAPPY" as an option in
Hbase 0.90.3 (the version I am on). How can I create a table with
SNAPPY compression via code ? Is this possible ?
HColumnDescriptor.setCompressionType() takes Algorithm enumeration as
a parameter.
Thanks,
Sam
If you are prefixing your keys with predictable hashes, you can do
range scans - i.e. create a scanner for each prefix and then merge
results at the client. With unpredictable hashes and key reversals ,
this might not be entirely possible.
I remember someone on the mailing list mentioning that Moz
The question really is if your region server hosting the hot tail end
of the region during sequential *writes* can take the load or not. If
you find in the future that it cannot, manually splitting the regions
is not going to fix the problem IMHO, since the tail end is always the
one that is going
One of the concerns I see with this schema is if one of the shows
becomes hot. Since you are maintaining your bookings at the column
level,
a hot "row" cannot be partitioned across regions. Hbase is atomic at
the row level. Therefore, different clients updating to the same
SHOW_ID
will compete with
I have a table that I only use for generating indexes. It rarely will
have random reads, but will have M/R jobs running against it
constantly for generating indexes. Even the index table, random reads
will be rare. It will mostly be used for scanning blocks of data.
According to HBase The Definit
I think that is expected:
http://hbase.apache.org/metrics.html
On Wed, Nov 16, 2011 at 1:10 PM, Mark wrote:
> The only way I can get any metrics to work is if I append them to
> HADOOP_HOME/conf/hadoop-metrics.properties. Is this expected?
>
> On 11/16/11 11:37 AM, Mark wrote:
>>
>> I've enabled
If you are not too concerned with random access time, but want more
efficient scans, is increasing the block size then a good idea ?
On Mon, Nov 14, 2011 at 11:24 AM, lars hofhansl wrote:
> Did it speed up your queries? As you can see from the followup discussions
> here, there is some general c
Open TDSB does it I believe :
http://opentsdb.net/schema.html
I am curious to know although the difference between having string as a row
key, converting them into bytes, and then storing the key , as opposed to
having numerical values stored as native bytes as bit masks.
On Fri, Oct 28, 2011
On Tue, Oct 25, 2011 at 1:02 PM, Nicolas Spiegelberg
wrote:
>>According to my understanding, the way that HBase works is that on a
>>brand new system, all keys will start going to a single region i.e. a
>>single region server. Once that region
>>reaches a max region size, it will split and then mo
p-front. If you are building indexes using MR, then you
> probably don¹t need range scan ability on your keys.
>
> Thanks
> Karthik
>
>
>
> On 10/24/11 4:48 PM, "Sam Seigal" wrote:
>
>>According to my understanding, the way that HBase works is that on
ions.
>
>
> On 10/24/11 9:07 AM, "Stack" wrote:
>
>>On Mon, Oct 24, 2011 at 1:27 AM, Sam Seigal wrote:
>>> According to the HBase book , pre splitting tables and doing manual
>>> splits is a better long term strategy than letting HBase handle it.
>&g
Hi Stack,
Inline.
>> According to the HBase book , pre splitting tables and doing manual
>> splits is a better long term strategy than letting HBase handle it.
>>
>
> Its good for getting a table off the ground, yes.
>
>
>> Since I do not know what the keys from the prod system are going to
>> lo
According to the HBase book , pre splitting tables and doing manual
splits is a better long term strategy than letting HBase handle it.
I have done a lot of offline testing with HBase and I am at a stage
now where I would like to hook my cluster into the production queue
feeding data into our syst
on RS.
>
> Arun
>
> On Oct 14, 2011, at 2:45 PM, Sam Seigal wrote:
>
>> Hi All,
>>
>> I have the Datanode, JobTracker and RegionServer daemons running on a
>> fleet of machines. Each of these machines have 8 G of memory and are
>> dedicated hardware for r
Hi All,
I have the Datanode, JobTracker and RegionServer daemons running on a
fleet of machines. Each of these machines have 8 G of memory and are
dedicated hardware for running HBase. How do you guys decided which %
of memory to assign to each ? What should this number be dependent on
?
Thank yo
Is it possible to do incremental processing without putting the timestamp in
the leading part of the row key in a more efficient manner i.e. process
data that came within the last hour/ 2 hour etc ? I can't seem to find a
good answer to this question myself.
On Mon, Oct 10, 2011 at 12:09 AM, Stei
Start off with the HBase book, great resource for getting started:
http://ofps.oreilly.com/titles/9781449396107/
On Sun, Oct 9, 2011 at 10:25 PM, Syg raf wrote:
> Hello folks,
>
> I'm just starting with HBase and have a couple of rudimentary questions
> about how to use it:
>
> I have a simple
Scan object with start and row set to the region's
> start and end key).
> You probably want to group the regions by regionserver and have one thread
> per region server, or something.
>
>
> -- Lars
>
> From: Sam Seigal
> To: hbase-u..
Hi ,
Is there a known way to be able to do Scan's in parallel (in different
threads even) and then sort/combine the output ?
For a row key like:
prefix-event_type-event_id
prefix-event_type-event_id
I want to declare two scan objects (for say event_id_type foo)
Scan 1 => 0-foo
Scan 2 => 1-fo
query you should consider populating a second table with
> event_type-eventid as key, and timestamp as value.
> Why is the timestamp part of the key?
>
>
> -- Lars
>
>
> - Original Message -
> From: Sam Seigal
> To: hbase-u...@hadoop.apache.org
> Cc:
>
id is to first do a GET or a Scan to get the value, determine
the exact timestamp for the record and then write the updated value.
Is there a better way to do this in one server call ?
Thanks !
Sam
On Thu, Sep 29, 2011 at 6:27 PM, Sam Seigal wrote:
> Hi,
>
> I am wondering what is t
Hi,
I am wondering what is the best way to query a record when only the
leading and trailing letters of a row are known.
For example, if my row looks something like:
event_type-timestamp-eventid
If i know the event_type and eventid, but do not really care about the
timestamp, what is the most e
Hi,
I am running some tests with Hbase on some sample data. I keep on
seeing this exception warning in the logs:
Fri Sep 23 00:35:31 2011 GMT regionserver 7193-0@star1:0 [WARN] (IPC
Server handler 2 on 60020) org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 2 on 60020 caught:
java.nio.channe
If an input split is too large and memory a concern, we can surely
address this in TableInputFormat.getSplits() and limit the size ...
On Fri, Sep 16, 2011 at 6:39 PM, Sam Seigal wrote:
> Aren't there memory considerations with this approach ? I would assume
> the HashMap can get pret
writes would only happen once per
> map-task, and not do it on a per-row basis (which would be really
> expensive).
>
> A single region on a single RS could handle that no problem.
>
>
>
>
> On 9/16/11 9:00 PM, "Sam Seigal" wrote:
>
>>I see what you ar
ine that you would need to tune the
>>temp-table for the job and pre-create regions.
>>
>>Doug
>>
>>
>>
>>On 9/16/11 8:16 PM, "Sam Seigal" wrote:
>>
>>>I am trying to do something similar with HBase Map/Reduce.
>>>
>>>
I am trying to do something similar with HBase Map/Reduce.
I have event ids and amounts stored in hbase in the following format:
prefix-event_id_type-timestamp-event_id as the row key and amount as
the value
I want to be able to aggregate the amounts based on the event id type
and for this I am u
Hi All,
I would like to get your opinion on how to best optimize an HBase cluster
for map reduce jobs. The main purpose that we would like to experiment with
HBase is to do near real time aggregations for the data we receive. There is
a service that writes a constant stream of data to HBase. I wou
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
>
> >
> >From: Sam Seigal
> >To: user@hbase.apache.org; Andrew Purtell
> >Cc: "hbase-u...@hadoop.apache.org"
> >Sen
A question inline:
On Tue, Aug 30, 2011 at 2:47 AM, Andrew Purtell wrote:
> Hi Chris,
>
> Appreciate your answer on the post.
>
> Personally speaking however the endless Cassandra vs. HBase discussion is
> tiresome and rarely do blog posts or emails in this regard shed any light.
> Often, Cassan
Ah .. thanks ! I am using 0.90.1 right now, so that explains it.
On Tue, Aug 23, 2011 at 3:31 PM, Jean-Daniel Cryans wrote:
> It was fixed in either 0.90.3 or 0.90.4
>
> J-D
>
> On Mon, Aug 22, 2011 at 7:47 PM, Sam Seigal wrote:
> > Hi All,
> >
> > I had a r
Hi All,
I had a regionserver go down in my cluster. When I ran the "status" on the
hbase shell I got, 4 live servers and 1 dead (which is correct).
However, when the machine came back up and I started the regionserver on
it, and did ran "status" on the hbase shell , the output shows 5 live
serve
Hi All,
I had a question about the operational overhead of maintaining HBase in
production. Would someone care to share their experiences ? We have a team
of 3 DBAs dedicated to maintaining our Oracle cluster. I am curious to know
if we would need the same for HBase.
I am talking of a small clust
reason to use the same version timestamp for all lines passed into the
mapper ?
On Tue, Aug 2, 2011 at 5:13 PM, Sam Seigal wrote:
> Hi All,
>
> I am using the importtsv tool to load some data into an hbase cluster. Some
> of the row keys + cf:qualifier might occur more than once with
Hi All,
I am using the importtsv tool to load some data into an hbase cluster. Some
of the row keys + cf:qualifier might occur more than once with a different
value in the files I have generated. I would expect this to just create two
versions of the record with the different values. However, I am
Hi All,
A quick question on compression. I saw that HBase can use LZO compression
for storing data into the HFile.
Has anyone done experiments with using compressions at the application level
instead instead of letting HBase handle it ? Are there
advantages/disadvantages of this approach ?
Is it
On Thu, Jun 30, 2011 at 11:33 PM, Stack wrote:
> On Mon, Jun 27, 2011 at 11:37 PM, Aditya Karanth A
> wrote:
> >> I have heard that bigger the size of the regionserver, more time it
> takes
> >> for region splitting and slower the reads are. Is this true?
> > (I have not been able to experiment
Hi All,
I have a 14 node cluster setup for HBase. Someone else in my office needs to
use some of these machines and I would like to descale my cluster from 14 to
6 machines.
Is there an efficient way to do this ? Since there is data residing on the
machines I want to get rid of, are there utilitie
Hi,
I am loading data into my HBase cluster and running into two issues -
During my import, I received the following exception ->
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
53484 actions: servers with issues: spock7001:60020,
at
org.apache.hadoop.hbase.cl
Hi,
I had a question about how to check for existence of a record in HBase. I
went through some threads discussing the various techniques , mainly - row
locks and checkAndPut().
My schema looks like the following ->
---
The reason I am adding the prefix is to avoid hot spotting due to increasin
When using the write cache and setting setAutoFlush() to false, is there a
risk of data loss, even if WAL is enabled ?
On Mon, Jun 20, 2011 at 12:27 PM, Jeff Whiting wrote:
> There is the possibility that your keys have the same timestamp --
> especially if you are running multi-threaded. If th
Hi All,
I am trying to load data from my OLTP system into HBase. I am using
checkAndPut() to do this.
The reason I am using checkAndPut() and not put() is because the system I am
writing has idempotence requirements i.e. a value will be initially written
with a start state, and then with an end s
Hi All,
I had some questions about the hbase architecture that I am a little
confused about.
After doing reading over the internet / HBase book etc, my understanding of
regions is the following ->
When the cluster initially starts up (with no data), the regionservers come
online.
When the data
Hi All,
I had a question about a certain kind of query I would like to do in hbase.
I am storing records in HBase that transition from an initial state "A" to
an end state "B" .
Initially, the record I will store will look like the following ->
t1 rowid:columnFamily:A
when I get a notificatio
sage
> From: Sam Seigal
> To: user@hbase.apache.org
> Cc: j...@cloudera.com; tsuna...@gmail.com
> Sent: Wed, June 8, 2011 4:54:24 PM
> Subject: Re: hbase hashing algorithm and schema design
>
> On Wed, Jun 8, 2011 at 12:40 AM, tsuna wrote:
>
> > On Tue, Jun 7,
multiple
scanners ?
Thanks a lot for your help.
--
*From:* Joey Echeverria
*To:* Sam Seigal
*Sent:* Wed, June 8, 2011 5:08:32 PM
*Subject:* Re: hbase hashing algorithm and schema design
A better option than a uuid would be to take a hash of the
eventid-timestamp
On Wed, Jun 8, 2011 at 12:40 AM, tsuna wrote:
> On Tue, Jun 7, 2011 at 7:56 PM, Kjew Jned wrote:
> > I was studying the OpenTSDB example, where they also prefix the row keys
> with
> > event id.
> >
> > I further modified my row keys to have this ->
> >
> >
> >
> > The uuid is fairly unique
gt; prefix each key with a hash of the key. The downside is sequential scans now
> have to be performed with multiple scanners and re-ordered client side.
>
> -Joey
>
> On Jun 3, 2011, at 3:35, Sam Seigal wrote:
>
> > Hi,
> >
> > I am not able to find information regard
Hi,
I am not able to find information regarding the algorithm that decides which
region a particular row belongs to in an HBase cluster. Does the algorithm
take into account the number of physical nodes ? Where can I find more
details about it ?
I went through the HBase book and the OpenTSDB sche
Hi,
I am not able to find information regarding the algorithm that decides which
region a particular row belongs to in an HBase cluster. Does the algorithm
take into account the number of physical nodes ? Where can I find more
details about it ?
I went through the HBase book and the OpenTSDB sche
55 matches
Mail list logo