setting of HBase, how much memory should be reserved for a Hbase regionserver.

2010-03-01 Thread steven zhuang
hi, all, We have some PCs(a dozens of them) with 4 cores CPU and 4GB ram each, a hadoop instance is already running on these machines. Currently there is around 1.2GB ram left on every node. We want to setup a Hbase instance on these machines, with 3 quorum servers

Re: setting of HBase, how much memory should be reserved for a Hbase regionserver.

2010-03-01 Thread Ryan Rawson
I would consider doing the following things: - disable the block cache - you dont have ram to spare. 4gb is really minimum (my laptop is 4gb ram and i decry how sucky I am occasionally) - run between 1000-2000m, with the cache disabled you are in better shape - consider adjusting the config values

Re: setting of HBase, how much memory should be reserved for a Hbase regionserver.

2010-03-01 Thread steven zhuang
thanks, Ryan, I think I can enlarge the heap size a little, and meanwhile shrink the write buffer sounds good too, but I am not sure we can disable the block cache, since the table is already built, anyway I can do that? I still have the question that if I add more regionse

Re: OOME Java heap space

2010-03-01 Thread Ryan Rawson
Yes/no. During the read process we load a block of the hfile in at a time. We only retain the cells which are picked by your scan/get query specification. So columns you are not interested in are not retained. But if you have huge values intermixed with tiny values, yeah we will scan and read a

Re: setting of HBase, how much memory should be reserved for a Hbase regionserver.

2010-03-01 Thread Ryan Rawson
The block cache is the memory limit used to keep blocks in the regionserver memory to allow you to have faster operations. It's literally a ram cache. And yes, adding more regionservers helps :-) Hooray for near-linear scalability. On Mon, Mar 1, 2010 at 12:26 AM, steven zhuang wrote: > thanks,

A small table is a bottleneck ?

2010-03-01 Thread y_823910
Hi, Will a small table like META table be a bottleneck? Thanks Fleming Chiu(邱宏明) 707-6128 y_823...@tsmc.com 週一無肉日吃素救地球(Meat Free Monday Taiwan) --- TSMC PROPERTY

Re: LZO vs GZIP vs NO COMPREESSION: why is GZIP the winner ???

2010-03-01 Thread Vincent Barat
Le 01/03/10 01:20, Dan Washusen a écrit : My (very rough) calculation of the data size came up with around 50MB. That was assuming 400 bytes * 100,000 for the values, 32 + 8 * 13 * 100,000 for the keys and an extra meg or two for extra key stuff. I didn't understand how that resulted in the a

Re: Why windows support is critical

2010-03-01 Thread nocones77-groups
This is my first post to the group, so I'm not sure I have a lot to add to the conversation yet. But I've been lurking/searching for a week now, and wanted to add a "me too" to Ravi's comments. The quick-start would be fantastic if it actually worked cross-platform and did something meaningful.

Re: Why windows support is critical

2010-03-01 Thread Edward Capriolo
On Mon, Mar 1, 2010 at 11:47 AM, wrote: > This is my first post to the group, so I'm not sure I have a lot to add to > the conversation yet. But I've been lurking/searching for a week now, and > wanted to add a "me too" to Ravi's comments. > > The quick-start would be fantastic if it actually w

Re: Why windows support is critical

2010-03-01 Thread Lars Francke
> I ended up creating a pseudo-distributed installation on Ubuntu in a > Virtual Box. It all works fine from localhost, and I can run the shell. But I > don't see how that's useful to anyone who actually wants to build a real > application. I'm struggling to > figure out how to "connect" to it fr

duplicate regionserver entries

2010-03-01 Thread Ted Yu
Hi, We use hbase 0.20.1 On http://snv-it-lin-006:60010/master.jsp, I see two rows for the same region server: snv-it-lin-010.projectrialto.com:600301267038448430requests=0, regions=25, usedHeap=1280, maxHeap=6127 snv-it-lin-010.projectrialto.com:60030 1267466540070requests=0, regions=2, usedHeap=12

Re: LZO vs GZIP vs NO COMPREESSION: why is GZIP the winner ???

2010-03-01 Thread Jean-Daniel Cryans
> You are right, there is no region split when I use no compression. > Nevertheless, as you say, if everything is in the memstore, how can it be > that I see a so big difference between my tests ? Well did you run your test more than once? Do you see the exact same results every time? IMO at that

Re: A small table is a bottleneck ?

2010-03-01 Thread Jean-Daniel Cryans
Usually .META. doesn't get much variance in a production cluster that does normal website serving. That means the .META. is able to stay in the block cache so all reads are served from memory and at the same time the hbase clients have warmed up cache so they don't even need to talk to .META. J-D

Re: HBase reading performance

2010-03-01 Thread Jean-Daniel Cryans
In this particular case a lot of things come in action: - Creating a table is a long process because the client sleeps a lot, 6 seconds before 0.20.3, 2 seconds in 0.20.3 and even less than that in the current head of branch. - in 0.20, without the HDFS-200 patch, HDFS doesn't support fs syncs so

RE: duplicate regionserver entries

2010-03-01 Thread Michael Segel
> Date: Mon, 1 Mar 2010 10:46:59 -0800 > Subject: duplicate regionserver entries > From: yuzhih...@gmail.com > To: hbase-user@hadoop.apache.org > > Hi, > We use hbase 0.20.1 > On http://snv-it-lin-006:60010/master.jsp, I see two rows for the same > region server: > snv-it-lin-010.projectrialto.

Sequental put/get/delete don't work

2010-03-01 Thread Yura Taras
Hi all I'm learning HBase and I faced a following problem. I'm running HBase 0.20.3 on Windows+Cygwin (just bin/start-hbase.sh). I'm creating simple table in a shell: hbase(main):033:0> create 't1', 'f1' 0 row(s) in 2.0630 seconds Then I'm trying to execute following JUnit test - and it fails on

Re: Two regions with same start_key

2010-03-01 Thread Ryan Rawson
I ran into these kinds of issues testing a pre-alpha hbase back in "the day"... If you have multiple regions that are weirdly overlapping, maybe as such: A -> B B -> C B -> D C -> F F -> G G -> H Here we see there are 3 "weird" regions, the [B,C) [B,D) and [C,F) regions... taken together they see

Re: Why windows support is critical

2010-03-01 Thread nocones77-groups
[apologies for the earlier rich text email...fixed] Thanks for the replies. You've reaffirmed that I can figure it out, and I'm still working toward it. The farther I get, the more I'm realizing it isn't really all that *complicated*...just really *different*. If all goes well, maybe I can he

RE: Why windows support is critical

2010-03-01 Thread Jonathan Gray
What are the issues with developing w/ HBase on Windows 7 x64? I'm doing that right now and nothing was any different from doing it on Windows XP x86. I haven't run it to the point of actually doing a start-hbase.sh, but rather running things like HBaseClusterTestCase w/o a problem. JG -Ori

Questions about HBase

2010-03-01 Thread William Kang
Hi guys, I am new to HBase and have several questions. Would anybody kindly answer some of them? 1. Why HBase could provide a low-latency random access to files compared to HDFS? 2. By default, Only a single row at a time may be locked. Is it a single client who can only lock one or is it globall

Re: Sequental put/get/delete don't work

2010-03-01 Thread Dan Washusen
Hi Yura, Having a quick look at your code I can see the following issues; 1. Deletes are actually just flags to tell HBase that timestamps/versions of a row are to be deleted eventually. In your test you are putting the same version twice with a delete in between. Instead of calling

Cannot open filename error

2010-03-01 Thread Ted Yu
Hi, I saw this in our HBase 0.20.1 master log: 2010-03-01 12:38:42,451 INFO [HMaster] master.ProcessRegionOpen(80): Updated row domaincrawltable,,1267475905927 in region .META.,,1 with startcode=1267475746189, server=10.10.31.135:60020 2010-03-01 12:39:06,088 INFO [Thread-10] master.ServerManage

Re: Sequental put/get/delete don't work

2010-03-01 Thread Yura Taras
Thanks, Dan 2. flush/autoCommit didn't help. 1. Do I understand correctly that deleting a row will ensure that I won't be able to insert data to given row again? IMO it's weird. Anyway, this wouldn't touch me, but I wanted to write unit tests which use HBase and delete data between runs. Looks li

Re: OOME Java heap space

2010-03-01 Thread Chris Tarnas
Thanks! That completely answers my question and really helps my schema plans. -chris On Mar 1, 2010, at 12:27 AM, Ryan Rawson wrote: > Yes/no. During the read process we load a block of the hfile in at a > time. We only retain the cells which are picked by your scan/get > query specification.

Re: Sequental put/get/delete don't work

2010-03-01 Thread Dan Washusen
Comments inline... On 2 March 2010 08:05, Yura Taras wrote: > Thanks, Dan > > 2. flush/autoCommit didn't help. > > 1. Do I understand correctly that deleting a row will ensure that I > won't be able to insert data to given row again? IMO it's weird. > Deleted rows are removed when a compaction

Re: Why windows support is critical

2010-03-01 Thread nocones77-groups
Inline... - Original Message > From: Jonathan Gray > To: hbase-user@hadoop.apache.org > Sent: Mon, March 1, 2010 3:17:50 PM > Subject: RE: Why windows support is critical > > What are the issues with developing w/ HBase on Windows 7 x64? I'm doing > that right now and nothing was any

Re: Sequental put/get/delete don't work

2010-03-01 Thread Ryan Rawson
HBase is not like your typical database. It doesn't overwrite data in situ, it doesn't delete data from disk right away, it uses delete markers (aka tombstones). What it does do is keep multiple versions, and uses timestamps with millisecond accuracy to discern new and old data. When you do rapid

Re: Questions about HBase

2010-03-01 Thread Erik Holstad
Hey William! On Mon, Mar 1, 2010 at 12:36 PM, William Kang wrote: > Hi guys, > I am new to HBase and have several questions. Would anybody kindly answer > some of them? > > 1. Why HBase could provide a low-latency random access to files compared to > HDFS? > Have a look at http://wiki.apache.org/

Re: Questions about HBase

2010-03-01 Thread Ryan Rawson
Hi, 1. We use in-memory indexes to get fast random reads. Our index tells us to read block X of a file only retrieving a small amount of the file to satisfy the user's read. 2. The row locking is not global - for each row there can only be 1 thread doing a put at a time. This serializes all p

Re: Questions about HBase

2010-03-01 Thread Erik Holstad
On Mon, Mar 1, 2010 at 2:16 PM, Ryan Rawson wrote: > Hi, > > 1. We use in-memory indexes to get fast random reads. Our index > tells us to read block X of a file only retrieving a small amount of > the file to satisfy the user's read. > > 2. The row locking is not global - for each row there c

Re: Why windows support is critical

2010-03-01 Thread nocones77-groups
Just wanted to thank Lars for the clues to the final bits in my puzzle. I was misunderstanding the ZooKeeper's role slightly, and didn't have its configuration elements set properly on the client. I'm now up and running with a single node Ubuntu server (running in Virtual Box) with independent

Re: Cannot open filename error

2010-03-01 Thread Jean-Daniel Cryans
Ted, Most probably double assignment, have a look at the region server logs and you will see that a some point the META was compacted while the another region server was hosting it. This usually happens under stress and is mostly fixed in 0.20.3 then there's http://issues.apache.org/jira/browse/HB

Re: HBase reading performance

2010-03-01 Thread y_823910
Hi, We treat HBASE as a DataGrid. There are a lot of HBase java client in our Compute Grid(GridGain) to fetch data from HBASE concurrently. Our data is normalized data from Oracle, these computing code is to do join and some aggregations. So our POC job is to Loading Tables' data from Hbase -> Co

Re: HBase reading performance

2010-03-01 Thread Jean-Daniel Cryans
Ah I understand now, thanks for the context. So I interpreted your first test wrong, you are just basically hitting .META. with a lot of random reads with lots of clients that have completely empty caches when the test begins. So here you hit some pain points we have currently WRT random reads but

Re: Questions about HBase

2010-03-01 Thread William Kang
Hi Erik and Ryan, Thanks a lot for your replies. I think I probably misunderstood something since I saw in HBase documentation it says "MapFiles cannot currently be mapped into memory". But the index of it can still be mapped into memory, right? Is it just like bigtable that the index is at the end

Re: HBase reading performance

2010-03-01 Thread y_823910
If I just start a client to fetch the META infomation (string) then inject it to another clients. Will it be possible? Thanks Fleming Chiu(邱宏明) 707-6128 y_823...@tsmc.com 週一無肉日吃素救地球(Meat Free Monday Taiwan)

Re: HBase reading performance

2010-03-01 Thread Alvin C.L Huang
@J-D I like the idea of 'warm up'. I wondered whether it is possible to clone client caches across JVMs. (A cache of hot regions or a cache of a running job) -- Alvin C.-L., Huang / 黃俊龍 ATC, ICL, ITRI, Taiwan T: 886-3-59-14625 本信件可能包含工研院機密資訊,非指定之收件者,請勿使用或揭露本信件內容,並請銷毀此信件。 This email may contain co

Re: HBase reading performance

2010-03-01 Thread Jean-Daniel Cryans
Alvin, That feature doesn't exist currently and I don't see a nice way of doing it as those regions will change location over time (tho on a normal production system it shouldn't vary that much). But, someone motivated could do the following: - Have a new method in HConnectionManager.TableServers

Re: Questions about HBase

2010-03-01 Thread William Kang
And another question, why the ROOT table and META tables are not in the HBaseMaster's memory? Would this greatly degrade the performance? Thanks. William On Mon, Mar 1, 2010 at 5:33 PM, Erik Holstad wrote: > On Mon, Mar 1, 2010 at 2:16 PM, Ryan Rawson wrote: > > > Hi, > > > > 1. We use in-m

Re: Handling Interactive versus Batch Calculations

2010-03-01 Thread Bradford Stephens
Hey Nenshad -- I think Jonathan Gray began working on something similar to this a few months ago for Streamy. As JD said, Coprocessors are very interesting, and I think they're worth looking at (or contributing a patch fo!) if you basically need to use HBase as a "Giant Spreadsheet". Such as: (Ro

Re: Why windows support is critical

2010-03-01 Thread Andrew Purtell
That's what I still want to know. I also use a Windows system just about on a daily basis to develop and test HBase. I think the objection really is about needing Cygwin (or $deity forbid Mingwin) to run the scripts to launch a multiprocess install. Windows guys just don't want to deal with UNI

Re: Handling Interactive versus Batch Calculations

2010-03-01 Thread Andrew Purtell
> I think Jonathan Gray began working on something similar to this a few > months ago for Streamy. Regrettably that was proprietary and remains so to the best of my knowledge. > As JD said, Coprocessors are very interesting, and I think they're > worth looking at (or contributing a patch fo!)

Re: Handling Interactive versus Batch Calculations

2010-03-01 Thread Andrew Purtell
You may have made the mental substitution, but just in case not: > Also the server side implementation holds all intermediate values in the > heap. > What we have now is a sketch that needs some work. It really should spill > intermediates to local disk (as HFiles) as necessary and then read/m

Re: Questions about HBase

2010-03-01 Thread Ryan Rawson
The ROOT/META are hosted on a regionserver so the system is self-referential and self-hosting, furthermore it reduces the dependency on the master to just a single duty of reassigning regions as necessary. -ryan On Mon, Mar 1, 2010 at 7:36 PM, William Kang wrote: > And another question, why the

Re: Questions about HBase

2010-03-01 Thread Ryan Rawson
I guess I interpret 'global locking' as in "affects all threads regardless of if they are interested in a particular row or not". Row locks only affect threads that are interested in said row. Unrelated threads are unaffected. Sorry if there is a little duplication of words here, it's a fairly s