structured data knowledge store in HBase

2010-11-03 Thread Phillip Nelson
Hi Guys, Thank you guys ahead of time for reading through this and for any feedback you guys can give. I'm relatively new to HBase but I'm really enjoying working with it. I'm working on a project to store a large amount of simple structured data into HBase. Basically, it's a subset of owl+rdf

Re: split not working

2010-11-03 Thread Stack
Try it from the shell. Try splitting individual regions (Scan .META. first to see list or check your UI where it lists the regions in a table). St.Ack On Wed, Nov 3, 2010 at 6:22 PM, Buttler, David wrote: > Hi all, > I have a small table (3M rows) that is using 4 regions.  I want to split the

Re: random access slowly when writing new data to hTable, how to optimize?

2010-11-03 Thread Ted Yu
You can add this to line 3: table.setAutoFlush(false); On Wed, Nov 3, 2010 at 8:01 PM, 刘京磊 wrote: > hadoop version: 0.20.2 > hbase version: 0.20.6 > > 1 master , 7 slaves/regionservers : 4cpus, 34G memory hbase heap size: > 15G > 3 zookeepers: 2cpus, 8G memeory > > now, there are 157

random access slowly when writing new data to hTable, how to optimize?

2010-11-03 Thread 刘京磊
hadoop version: 0.20.2 hbase version: 0.20.6 1 master , 7 slaves/regionservers : 4cpus, 34G memory hbase heap size: 15G 3 zookeepers: 2cpus, 8G memeory now, there are 1573 regions(about 1T). It spends 10ms-200ms when random accessing if not writing. We need to put 200G data (about 0.4billion

split not working

2010-11-03 Thread Buttler, David
Hi all, I have a small table (3M rows) that is using 4 regions. I want to split the table so that I can take advantage of more nodes in my cluster with map/reduce tasks and the TableInputFormat. The split button on the web page sends split messages to the master/region server, but nothing seems

Re: Where do you get your hardware?

2010-11-03 Thread Jack Levin
We are doing it with system integrator called Racklogic in San Jose. We tell them what to build and they do it per our intructions. We are running 3 datacenters with 500 servers however, and 20 Gbps of traffic to the world... so, a lot of our stuff is custom made. -Jack On Wed, Nov 3, 2010 at 11

Re: Where do you get your hardware?

2010-11-03 Thread Jason Lotz
Thanks for the replies. My take away is that most organizations are buying from vendors (Dell, HP, SuperMicro, HP, etc.) While "build it yourself" is an approach, I'm not hearing a lot of companies that are doing it. Thanks again, Jason On Wed, Nov 3, 2010 at 10:04 AM, Michael Segel wrote: > >

Re: about hbase security

2010-11-03 Thread Gary Helmling
HBase access control features are in active development at the moment. Currently we're building on top of secure Hadoop and using Kerberos for client authentication, with HBase providing additional tools for managing access to individual tables or column families. See the following issues in JIRA:

Re: HBase as a versioned key/value store

2010-11-03 Thread Stack
On Wed, Nov 3, 2010 at 7:15 AM, Wojciech Langiewicz wrote: > > I'm running latest version from Cloudera Try a later version of the 0.89 series. See the downloads page on our site. It has perf. improvements. >> Each KV is a distinct Put operation?  Normally people get high throughput >> by b

RE: HBase as a versioned key/value store

2010-11-03 Thread Jonathan Gray
Ah, reads. Totally different story. Are you using the block cache? How much heap do you have configured for your RSs? Some of that debug should be displaying the stats of your block cache... Want to paste a few lines of that? > -Original Message- > From: Wojciech Langiewicz [mailto:w

Re: Node failure causes weird META data?

2010-11-03 Thread Erdem Agaoglu
We suspect some misconfiguration too but are unsure what that might be. Almost all of the configuration parameters are in default along with dfs.support.append which is true in append branch. We checked RS logs and couldn't find any Exceptions, everything seems normal so it is unlikely to be a bug.

Heapsize you are running on?

2010-11-03 Thread Sean Bigdatafun
I understand that too much heapsize will cause GC pauses, which is definitely a bad thing noboday wants to see. But really, does anyone run HBase with heapsize higher than 8GB? We know almost any DB-like system is normally memory hungry (or put it in another word, the more memory the better). This

Re: HBase as a versioned key/value store

2010-11-03 Thread Wojciech Langiewicz
Hello, 2010/11/3 Jonathan Gray > Hi Wojciech, > > HBase can easily be used as a versioned key/value store. I'd say that's > one of the easiest ways to use it. > > To help you get more throughput, you'll have to provide more details. > > What version are you running, what kind of hardware / conf

RE: Where do you get your hardware?

2010-11-03 Thread Michael Segel
Well I usually go to Home Depot, even though there's an ACE a block away... :-) (Just kidding) If you're keen on Dell, I don't know if they are still making R410s. They're 1U so you can put in 4 Hot Swap drives giving you roughly 7TB per node. They have multiple 1GBe ports so you can bond them i

RE: HBase as a versioned key/value store

2010-11-03 Thread Jonathan Gray
Hi Wojciech, HBase can easily be used as a versioned key/value store. I'd say that's one of the easiest ways to use it. To help you get more throughput, you'll have to provide more details. What version are you running, what kind of hardware / configuration, and what does your client look lik

Re: Where do you get your hardware?

2010-11-03 Thread Jeremy Carroll
We have used the Dell Cloud Servers (C2100 to be exact). Turn around time is a little slower, but it's worth it IMHO. On 11/3/10 8:36 AM, "Patrick Angeles" wrote: >Jason, > >Unless you're operating at Google scale, it doesn't make economic sense to >build your own unless you're *really into that

Re: Where do you get your hardware?

2010-11-03 Thread Patrick Angeles
Jason, Unless you're operating at Google scale, it doesn't make economic sense to build your own unless you're *really into that*. Most major vendors (HP, Dell, SuperMicro) will offer a configuration that is very suitable for Hadoop. Regards, - P On Wed, Nov 3, 2010 at 9:21 AM, Jason Lotz wro

Re: Where do you get your hardware?

2010-11-03 Thread Tim Robertson
We just set up a cluster with Dells, and have a pretty fine relationship with a local Dell supplier. Tim On Wed, Nov 3, 2010 at 2:21 PM, Jason Lotz wrote: > We are in the process of analyzing our options for the future purchases of > our Hadoop/HBase DN/RS servers.  Currently, we purchase Dell

Re: about hbase security

2010-11-03 Thread Sean Bigdatafun
CDH3beta3 seems to provide what you want (ACL) On Wed, Nov 3, 2010 at 1:56 AM, 梁景明 wrote: > hi , is there any features for me to control the client to access to my > hbase. > like some authority ,some user or some password? > > now one way to control > my servers use iptables to control the ac

Where do you get your hardware?

2010-11-03 Thread Jason Lotz
We are in the process of analyzing our options for the future purchases of our Hadoop/HBase DN/RS servers. Currently, we purchase Dell PowerEdge R710's which work well for us. However, we know that there are other options that may give us more bang for our buck. I'm not as interested in knowing

HBase as a versioned key/value store

2010-11-03 Thread Wojciech Langiewicz
Hello, I would like to know if any of is are using HBase as a versioned key/value store. What I mean by versioned is keys map to multiple values with timestamp. So the whole table would have many rows and only one column family with one column. I'm trying to work out what performance could

about hbase security

2010-11-03 Thread 梁景明
hi , is there any features for me to control the client to access to my hbase. like some authority ,some user or some password? now one way to control my servers use iptables to control the access, is there any better way ? thanks.