some questions about hbase in production environment

2012-03-05 Thread Qian Ye
Hi all: I'm a newbie to HBase. Here are two questions about hbase in production environment. I would very appreciate it if anyone could give a help. 1. Which hbase configuration recommended to be set, rather than use the default, when using in production environment? So far, I knew that these par

Re: Install hbase on Ubuntu 11.10

2012-03-05 Thread shashwat shriparv
just download hbase from apace extract and by going to hbase folder give command bin/start-hbase.sh it will start as for the stand alone hbase you dont need to make any change to the configuration file On Tue, Mar 6, 2012 at 9:44 AM, Gopal wrote: > On 3/5/2012 4:14 AM, Mahdi Negahi wrote: >

Re: hbase 0.90.5 and hadoop 0.20.203

2012-03-05 Thread Gopal
On 3/6/2012 12:44 AM, Gopal wrote: On 3/5/2012 11:21 PM, Harsh J wrote: directory. The bundled jar is ONLY for use in standalone mode. In Now getting this :- With 0.20.205.0 java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMasterCommandLine$Loca

Re: hbase 0.90.5 and hadoop 0.20.203

2012-03-05 Thread Gopal
On 3/5/2012 11:21 PM, Harsh J wrote: directory. The bundled jar is ONLY for use in standalone mode. In Now getting this :- With 0.20.205.0 java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMasterorg.apache.commons.configura

Re: hbase 0.90.5 and hadoop 0.20.203

2012-03-05 Thread Harsh J
Gopal, No, just follow the http://hbase.apache.org/book.html#hadoop document to have it working. It is important to read that whole section (as the section itself states). To quote the specific, last paragraph: "Because HBase depends on Hadoop, it bundles an instance of the Hadoop jar under its l

Re: Install hbase on Ubuntu 11.10

2012-03-05 Thread Gopal
On 3/5/2012 4:14 AM, Mahdi Negahi wrote: Dear All Friends I'm new at Linux and Hbase. At first time, I install hbase on windows by Cygwin successfully but after install Thrift everything change. so I decided to change my OS and try to install Hbase on Ubuntu 11.10. I have tried for 2 weeks wit

Re: hbase 0.90.5 and hadoop 0.20.203

2012-03-05 Thread Gopal
On 3/5/2012 11:07 PM, Harsh J wrote: 1. What exactly is incompatible with a 0.20.203 mix? What error made java.io.IOException: Call to master/192.168.1.76:56310 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:775) at org.

Re: hbase 0.90.5 and hadoop 0.20.203

2012-03-05 Thread Harsh J
Gopal, HBase+Hadoop version picking is already documented in good depth at http://hbase.apache.org/book.html#hadoop Two points about your blog post: 1. What exactly is incompatible with a 0.20.203 mix? What error made you think so? 2. Both 0.20.2 and 0.20.203 do not have HDFS append/sync features

hbase 0.90.5 and hadoop 0.20.203

2012-03-05 Thread Gopal
http://myexadata.blogspot.com/2012/03/hbase-0905-and-hadoop-020203.html after playing with a while , I concluded that hbase 0.90.5 and hadoop 0.20.203 are incompatible. Please advise if it is otherwise. Thanks

Re: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?

2012-03-05 Thread Doug Meil
To stress what Andrew said, the HBase homepage says: "HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leve

Re: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?

2012-03-05 Thread Andrew Purtell
> On Mar 5, 2012, at 6:28 PM, D S wrote: > Simple, I want to see what is meant by the claim that HBase = Big Table. > How far does this claim go? Who is making this claim? I think we say that HBase is a BigTable clone, because it attempts to be faithful to the BigTable architecture as described

Re: Bulk loading a CSV file into HBase

2012-03-05 Thread Shrijeet Paliwal
Anil, Stack meant adding debug statements yourself in tool. -Shrijeet On Mon, Mar 5, 2012 at 4:54 PM, anil gupta wrote: > Hi St.Ack, > > Thanks for the response. Both the tsv and csv are UTF-8 file. Could you > please let me know how to run bulk loading in Debug mode? I dont know of > any hadoo

Re: Bulk loading a CSV file into HBase

2012-03-05 Thread anil gupta
Hi St.Ack, Thanks for the response. Both the tsv and csv are UTF-8 file. Could you please let me know how to run bulk loading in Debug mode? I dont know of any hadoop option which can run a job in Debug mode. Thanks, Anil On Mon, Mar 5, 2012 at 2:58 PM, Stack wrote: > On Mon, Mar 5, 2012 at 11

Re: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?

2012-03-05 Thread Ian Varley
DS, HBase is an open source project, so you can read the source code and make that determination for yourself. It was first created based on the same ideas in the Bigtable paper (published by Google) but is only related based on the design goals and philosophy, not the actual implementation. B

Re: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?

2012-03-05 Thread D S
Simple, I want to see what is meant by the claim that HBase = Big Table. How far does this claim go? How identical are the two products? Does it stop at the fronted specifications? Does it go into the internals? I just want to know how identical these two products are and how different are the

Re: Bulk loading a CSV file into HBase

2012-03-05 Thread Stack
On Mon, Mar 5, 2012 at 11:48 AM, anil gupta wrote: > I am getting a "Bad line at offset" error in Stderr log of tasks while > testing bulk loading a CSV file into HBase. I am using cdh3u2. Import of a > TSV works fine. > Its your encoding of the tsv and csv or its a problem w/ the parsing code in

Re: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?

2012-03-05 Thread lars hofhansl
This is a hypothetical question. Why do you care? Can you run current Windows on '03 machines? Or Linux (with KDE/Gnome)? HBase is designed for modern machines. From: D S To: user@hbase.apache.org Sent: Monday, March 5, 2012 11:39 AM Subject: Re: HBase & BigT

RE: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?

2012-03-05 Thread Sandy Pratt
I have HBase instances with 2GB heap that perform ok. I'm sure they would perform better with more RAM, but they are definitely good enough to test queries and so forth. I bet you could probably get down to 1.5 or 1 GB and be stable if you wanted to. > -Original Message- > From: D S [

Re: gc pause killing regionserver

2012-03-05 Thread Mikael Sitruk
Try to set the initialOccupancy to a lower value and try to allocate more space to the new generation space. It seems that you don't have enough place in the survivor space, and therefore you have a promotion failure. You also mention that the same server is frequently having this problem, is it b

RE: gc pause killing regionserver

2012-03-05 Thread Sandy Pratt
What was the actual process size of the JVM as reported by top? Why use the following in your config? -XX:NewRatio=16 -XX:MaxGCPauseMillis=100 Do you really have a stringent latency target, or are you just being aggressive? If I'm reading your log correctly, you have about 2.5 GB of heap, right

Re: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?

2012-03-05 Thread Alan Chaney
On 3/5/2012 11:39 AM, D S wrote: On 3/5/12, Michael Drzal wrote: Y Is HBase's configuration options robust enough that it could go back and run well on those 2003 specs by a bit of tweaking if that what was desired? What do you mean "run well"? Run as well as Big Table would have done on th

Re: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?

2012-03-05 Thread D S
On 3/5/12, Michael Drzal wrote: > You really need to consider the entire historical context here. A lot of > the memory used in hbase is buffering writes to disk and for the block > cache. These days, it isn't unreasonable to get 12 2-3TB disks in a > commodity server. Back in 2003, you would n

Re: HBase with EMR

2012-03-05 Thread Amandeep Khurana
Correct - you can access any external service by using a custom jar. On Sun, Mar 4, 2012 at 10:55 PM, Mohit Gupta wrote: > HI All, > > Thank you so much. It has been a great help. > As of now, I am exploring the idea of running an HBase cluster on EC2 ( EBS > backed) and using EMR to run the heav

Re: HBase Region move() and Data Locality

2012-03-05 Thread Bryan Beaudreault
Thanks for the response! We are currently migrating analytics data from an old mysql setup to a new hbase-backed architecture. We have a bunch of versions of the data running at once, for testing, beta, live, etc, so we have 63 tables right now and 6451 regions hosted on 12 EC2 m1.xlarge servers.

Re: HBase Region move() and Data Locality

2012-03-05 Thread Jean-Daniel Cryans
So what's going on on that cluster exactly? You have a lot of tables of various sizes and they tend to grow on only one machine? One simple trick to get good balancing for a table is to disable it, balance the cluster, then re-enable it. It will be distributed properly at that point. J-D On Mon,

Re: What about storing binary data(e.g. images) in HBase?

2012-03-05 Thread Jacques
Namenode is limited on the number of blocks. Whether you changed the block size or not would not have much impact on the problem. I think that the limit is something like 150 million blocks. (Someone else can feel free to correct this.) (It isn't exactly that simple because it also has to do wit

Re: Install hbase on Ubuntu 11.10

2012-03-05 Thread Peter Vandenabeele
On Mon, Mar 5, 2012 at 10:14 AM, Mahdi Negahi wrote: > > Dear All Friends > I'm new at Linux and Hbase. At first time, I install hbase on windows by > Cygwin successfully but after install Thrift everything change. so I decided > to change my OS and try to install Hbase on Ubuntu 11.10. I have t

Re: HBase Region move() and Data Locality

2012-03-05 Thread Doug Meil
This doesn't address your question on move(), but regarding locality, see 8.7.3 in here... http://hbase.apache.org/book.html#regions.arch .. it's not just major compactions, but any write of a storefile that affects locality (flush, minor, major). On 3/5/12 11:02 AM, "Bryan Beaudreault" wr

HBase Region move() and Data Locality

2012-03-05 Thread Bryan Beaudreault
Hey all, We are running on cdh3u2 (soon to upgrade to 3u3), and we notice that regions are balanced solely based on the number of regions per region server, with no regard for horizontal scaling of tables. This was mostly fine with a small number of regions, but as our cluster reaches thousands o

Install hbase on Ubuntu 11.10

2012-03-05 Thread Mahdi Negahi
Dear All Friends I'm new at Linux and Hbase. At first time, I install hbase on windows by Cygwin successfully but after install Thrift everything change. so I decided to change my OS and try to install Hbase on Ubuntu 11.10. I have tried for 2 weeks without any progress. Please please somebody

Re: What about storing binary data(e.g. images) in HBase?

2012-03-05 Thread Michael Segel
Just a couple of things... MapR doesn't have the NN limitations. So if your design requires lots of small files, look at MapR... You could store your large blobs in a sequence file or series of sequence files using HBase to store the index. Sort of a hybrid approach. Sent from my iPhone On Mar

Re: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?

2012-03-05 Thread Michael Drzal
You really need to consider the entire historical context here. A lot of the memory used in hbase is buffering writes to disk and for the block cache. These days, it isn't unreasonable to get 12 2-3TB disks in a commodity server. Back in 2003, you would not get as many disks, and they would be m

Re: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?

2012-03-05 Thread Doug Meil
re: "Almost every blog I read about HBase tells me it's a clone of BigTable." The HBase website says that too http://hbase.apache.org/ re: "Almost every blog I've read about HBase also tells me to use a lot of RAM" So does the Hbase Reference Guide... http://hbase.apache.org/book.html#pe

Re: a strange situation of HBase when I issue scan '.META.' command

2012-03-05 Thread yonghu
Thanks for your apply. I didn't read it carefully. Yong On Mon, Mar 5, 2012 at 2:14 PM, Doug Meil wrote: > > Hi there- > > You might want to see this in the Ref Guide. > > http://hbase.apache.org/book.html#arch.catalog > > "A region with an empty start key >       is the first region in a table.

Re: a strange situation of HBase when I issue scan '.META.' command

2012-03-05 Thread Doug Meil
Hi there- You might want to see this in the Ref Guide. http://hbase.apache.org/book.html#arch.catalog "A region with an empty start key is the first region in a table. If region has both an empty start and an empty end key, its the only region in the table" On 3/5/12 7:27 AM, "yongh

a strange situation of HBase when I issue scan '.META.' command

2012-03-05 Thread yonghu
Hello, My HBase version is 0.90.2 and installed in pseudo mode. I have successfully inserted two tuples in the 'test' table. hbase(main):005:0> scan 'test' ROWCOLUMN+CELL jim column=course:english, timestamp=1330949116240, value=1.3 tom

Wrong Path look up in Hbase Bulk Uplaod

2012-03-05 Thread Garg, Rinku
Hi All, I am trying to upload a csv file using hbase bulk upload feature. Below is the URL which I referred: http://hbase.apache.org/bulk-loads.html I have the following problem that may be you or someone can help me out with. I am new to hadoop and the mapreduce feature. I tried to run my map

HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?

2012-03-05 Thread D S
Hi, I'm learning more about HBase and I'm curious how much of HBase is actually based on Google's original dB. In Google's origins stories, they are well known for using low cost commodity hardware in scale in order to store their web database. Almost every blog I read about HBase tells me it's