hi, all,
We have some PCs(a dozens of them) with 4 cores CPU and 4GB ram
each, a hadoop instance is already running on these machines.
Currently there is around 1.2GB ram left on every node.
We want to setup a Hbase instance on these machines, with 3 quorum
servers
I would consider doing the following things:
- disable the block cache - you dont have ram to spare. 4gb is really
minimum (my laptop is 4gb ram and i decry how sucky I am occasionally)
- run between 1000-2000m, with the cache disabled you are in better shape
- consider adjusting the config values
thanks, Ryan,
I think I can enlarge the heap size a little, and meanwhile
shrink the write buffer sounds good too, but I am not sure we can disable
the block cache, since the table is already built, anyway I can do that?
I still have the question that if I add more regionse
Yes/no. During the read process we load a block of the hfile in at a
time. We only retain the cells which are picked by your scan/get
query specification. So columns you are not interested in are not
retained. But if you have huge values intermixed with tiny values,
yeah we will scan and read a
The block cache is the memory limit used to keep blocks in the
regionserver memory to allow you to have faster operations. It's
literally a ram cache.
And yes, adding more regionservers helps :-) Hooray for near-linear
scalability.
On Mon, Mar 1, 2010 at 12:26 AM, steven zhuang wrote:
> thanks,
Hi,
Will a small table like META table be a bottleneck?
Thanks
Fleming Chiu(邱宏明)
707-6128
y_823...@tsmc.com
週一無肉日吃素救地球(Meat Free Monday Taiwan)
---
TSMC PROPERTY
Le 01/03/10 01:20, Dan Washusen a écrit :
My (very rough) calculation of the data size came up with around 50MB. That
was assuming 400 bytes * 100,000 for the values, 32 + 8 * 13 * 100,000 for
the keys and an extra meg or two for extra key stuff. I didn't understand
how that resulted in the a
This is my first post to the group, so I'm not sure I have a lot to add to the
conversation yet. But I've been lurking/searching for a week now, and wanted to
add a "me too" to Ravi's comments.
The quick-start would be fantastic if it actually worked cross-platform and did
something meaningful.
On Mon, Mar 1, 2010 at 11:47 AM, wrote:
> This is my first post to the group, so I'm not sure I have a lot to add to
> the conversation yet. But I've been lurking/searching for a week now, and
> wanted to add a "me too" to Ravi's comments.
>
> The quick-start would be fantastic if it actually w
> I ended up creating a pseudo-distributed installation on Ubuntu in a
> Virtual Box. It all works fine from localhost, and I can run the shell. But I
> don't see how that's useful to anyone who actually wants to build a real
> application. I'm struggling to
> figure out how to "connect" to it fr
Hi,
We use hbase 0.20.1
On http://snv-it-lin-006:60010/master.jsp, I see two rows for the same
region server:
snv-it-lin-010.projectrialto.com:600301267038448430requests=0, regions=25,
usedHeap=1280, maxHeap=6127 snv-it-lin-010.projectrialto.com:60030
1267466540070requests=0, regions=2, usedHeap=12
> You are right, there is no region split when I use no compression.
> Nevertheless, as you say, if everything is in the memstore, how can it be
> that I see a so big difference between my tests ?
Well did you run your test more than once? Do you see the exact same
results every time? IMO at that
Usually .META. doesn't get much variance in a production cluster that
does normal website serving. That means the .META. is able to stay in
the block cache so all reads are served from memory and at the same
time the hbase clients have warmed up cache so they don't even need to
talk to .META.
J-D
In this particular case a lot of things come in action:
- Creating a table is a long process because the client sleeps a lot,
6 seconds before 0.20.3, 2 seconds in 0.20.3 and even less than that
in the current head of branch.
- in 0.20, without the HDFS-200 patch, HDFS doesn't support fs syncs
so
> Date: Mon, 1 Mar 2010 10:46:59 -0800
> Subject: duplicate regionserver entries
> From: yuzhih...@gmail.com
> To: hbase-user@hadoop.apache.org
>
> Hi,
> We use hbase 0.20.1
> On http://snv-it-lin-006:60010/master.jsp, I see two rows for the same
> region server:
> snv-it-lin-010.projectrialto.
Hi all
I'm learning HBase and I faced a following problem. I'm running HBase
0.20.3 on Windows+Cygwin (just bin/start-hbase.sh). I'm creating
simple table in a shell:
hbase(main):033:0> create 't1', 'f1'
0 row(s) in 2.0630 seconds
Then I'm trying to execute following JUnit test - and it fails on
I ran into these kinds of issues testing a pre-alpha hbase back in
"the day"... If you have multiple regions that are weirdly
overlapping, maybe as such:
A -> B
B -> C
B -> D
C -> F
F -> G
G -> H
Here we see there are 3 "weird" regions, the [B,C) [B,D) and [C,F)
regions... taken together they see
[apologies for the earlier rich text email...fixed]
Thanks for the replies. You've reaffirmed that I can figure it out, and I'm
still working toward it.
The farther I get, the more I'm realizing it isn't really all that
*complicated*...just really *different*. If all goes well, maybe I can
he
What are the issues with developing w/ HBase on Windows 7 x64? I'm doing
that right now and nothing was any different from doing it on Windows XP
x86.
I haven't run it to the point of actually doing a start-hbase.sh, but rather
running things like HBaseClusterTestCase w/o a problem.
JG
-Ori
Hi guys,
I am new to HBase and have several questions. Would anybody kindly answer
some of them?
1. Why HBase could provide a low-latency random access to files compared to
HDFS?
2. By default, Only a single row at a time may be locked. Is it a single
client who can only lock one or is it globall
Hi Yura,
Having a quick look at your code I can see the following issues;
1. Deletes are actually just flags to tell HBase that timestamps/versions
of a row are to be deleted eventually. In your test you are putting the
same version twice with a delete in between. Instead of calling
Hi,
I saw this in our HBase 0.20.1 master log:
2010-03-01 12:38:42,451 INFO [HMaster] master.ProcessRegionOpen(80):
Updated row domaincrawltable,,1267475905927 in region .META.,,1 with
startcode=1267475746189, server=10.10.31.135:60020
2010-03-01 12:39:06,088 INFO [Thread-10]
master.ServerManage
Thanks, Dan
2. flush/autoCommit didn't help.
1. Do I understand correctly that deleting a row will ensure that I
won't be able to insert data to given row again? IMO it's weird.
Anyway, this wouldn't touch me, but I wanted to write unit tests which
use HBase and delete data between runs. Looks li
Thanks! That completely answers my question and really helps my schema plans.
-chris
On Mar 1, 2010, at 12:27 AM, Ryan Rawson wrote:
> Yes/no. During the read process we load a block of the hfile in at a
> time. We only retain the cells which are picked by your scan/get
> query specification.
Comments inline...
On 2 March 2010 08:05, Yura Taras wrote:
> Thanks, Dan
>
> 2. flush/autoCommit didn't help.
>
> 1. Do I understand correctly that deleting a row will ensure that I
> won't be able to insert data to given row again? IMO it's weird.
>
Deleted rows are removed when a compaction
Inline...
- Original Message
> From: Jonathan Gray
> To: hbase-user@hadoop.apache.org
> Sent: Mon, March 1, 2010 3:17:50 PM
> Subject: RE: Why windows support is critical
>
> What are the issues with developing w/ HBase on Windows 7 x64? I'm doing
> that right now and nothing was any
HBase is not like your typical database. It doesn't overwrite data in
situ, it doesn't delete data from disk right away, it uses delete
markers (aka tombstones). What it does do is keep multiple versions,
and uses timestamps with millisecond accuracy to discern new and old
data. When you do rapid
Hey William!
On Mon, Mar 1, 2010 at 12:36 PM, William Kang wrote:
> Hi guys,
> I am new to HBase and have several questions. Would anybody kindly answer
> some of them?
>
> 1. Why HBase could provide a low-latency random access to files compared to
> HDFS?
>
Have a look at http://wiki.apache.org/
Hi,
1. We use in-memory indexes to get fast random reads. Our index
tells us to read block X of a file only retrieving a small amount of
the file to satisfy the user's read.
2. The row locking is not global - for each row there can only be 1
thread doing a put at a time. This serializes all p
On Mon, Mar 1, 2010 at 2:16 PM, Ryan Rawson wrote:
> Hi,
>
> 1. We use in-memory indexes to get fast random reads. Our index
> tells us to read block X of a file only retrieving a small amount of
> the file to satisfy the user's read.
>
> 2. The row locking is not global - for each row there c
Just wanted to thank Lars for the clues to the final bits in my puzzle. I was
misunderstanding the ZooKeeper's role slightly, and didn't have its
configuration elements set properly on the client.
I'm now up and running with a single node Ubuntu server (running in Virtual
Box) with independent
Ted,
Most probably double assignment, have a look at the region server logs
and you will see that a some point the META was compacted while the
another region server was hosting it. This usually happens under
stress and is mostly fixed in 0.20.3 then there's
http://issues.apache.org/jira/browse/HB
Hi,
We treat HBASE as a DataGrid.
There are a lot of HBase java client in our Compute Grid(GridGain) to fetch
data from HBASE concurrently.
Our data is normalized data from Oracle, these computing code is to do join
and some aggregations.
So our POC job is to Loading Tables' data from Hbase -> Co
Ah I understand now, thanks for the context. So I interpreted your
first test wrong, you are just basically hitting .META. with a lot of
random reads with lots of clients that have completely empty caches
when the test begins.
So here you hit some pain points we have currently WRT random reads
but
Hi Erik and Ryan,
Thanks a lot for your replies.
I think I probably misunderstood something since I saw in HBase
documentation it says "MapFiles cannot currently be mapped into memory". But
the index of it can still be mapped into memory, right? Is it just like
bigtable that the index is at the end
If I just start a client to fetch the META infomation (string) then inject
it to
another clients. Will it be possible?
Thanks
Fleming Chiu(邱宏明)
707-6128
y_823...@tsmc.com
週一無肉日吃素救地球(Meat Free Monday Taiwan)
@J-D
I like the idea of 'warm up'.
I wondered whether it is possible to clone client caches across JVMs.
(A cache of hot regions or a cache of a running job)
--
Alvin C.-L., Huang / 黃俊龍
ATC, ICL, ITRI, Taiwan
T: 886-3-59-14625
本信件可能包含工研院機密資訊,非指定之收件者,請勿使用或揭露本信件內容,並請銷毀此信件。
This email may contain co
Alvin,
That feature doesn't exist currently and I don't see a nice way of
doing it as those regions will change location over time (tho on a
normal production system it shouldn't vary that much). But, someone
motivated could do the following:
- Have a new method in HConnectionManager.TableServers
And another question, why the ROOT table and META tables are not in the
HBaseMaster's memory? Would this greatly degrade the performance?
Thanks.
William
On Mon, Mar 1, 2010 at 5:33 PM, Erik Holstad wrote:
> On Mon, Mar 1, 2010 at 2:16 PM, Ryan Rawson wrote:
>
> > Hi,
> >
> > 1. We use in-m
Hey Nenshad --
I think Jonathan Gray began working on something similar to this a few
months ago for Streamy.
As JD said, Coprocessors are very interesting, and I think they're
worth looking at (or contributing a patch fo!) if you basically need
to use HBase as a "Giant Spreadsheet". Such as:
(Ro
That's what I still want to know. I also use a Windows system just about on a
daily basis to develop and test HBase.
I think the objection really is about needing Cygwin (or $deity forbid Mingwin)
to run the scripts to launch a multiprocess install. Windows guys just don't
want to deal with UNI
> I think Jonathan Gray began working on something similar to this a few
> months ago for Streamy.
Regrettably that was proprietary and remains so to the best of my knowledge.
> As JD said, Coprocessors are very interesting, and I think they're
> worth looking at (or contributing a patch fo!)
You may have made the mental substitution, but just in case not:
> Also the server side implementation holds all intermediate values in the
> heap.
> What we have now is a sketch that needs some work. It really should spill
> intermediates to local disk (as HFiles) as necessary and then read/m
The ROOT/META are hosted on a regionserver so the system is
self-referential and self-hosting, furthermore it reduces the
dependency on the master to just a single duty of reassigning regions
as necessary.
-ryan
On Mon, Mar 1, 2010 at 7:36 PM, William Kang wrote:
> And another question, why the
I guess I interpret 'global locking' as in "affects all threads
regardless of if they are interested in a particular row or not". Row
locks only affect threads that are interested in said row. Unrelated
threads are unaffected. Sorry if there is a little duplication of
words here, it's a fairly s
45 matches
Mail list logo