Re: How to improve HBase throughput with YCSB?

2011-05-31 Thread Ted Dunning
Woof. Of course. Harold, You appear to be running on about 10 disks total. Each disk should be capable of about 100 ops per second but they appear to be doing about 70. This is plausible overhead. Try attaching 5 or 10 small EBS partitions to each of your nodes and use them in HDFS. That may

Re: How to improve HBase throughput with YCSB?

2011-05-31 Thread Harold Lim
Hi Andrew, I tried running on c1.xlarge instances and the performance improved a little bit but the throughput is still low. I can now get throughput of 700+ read operations per second (up from 400-500+). I was hoping to get throughput in the order of thousands. I was wondering if there is som

Re: Regions count is not consistant between the WEBUI and LoaderBalancer

2011-05-31 Thread bijieshan
Sorry for a long time break of the discussion about this problem. Till now, I found one possible reason cause this problem. The main reason of this problem is the splitted region could be online again. The following is my anylysis: (The cluster has two HMatser, one active and one standby) 1.Whil

re: about disposing Hbase process

2011-05-31 Thread Gaojinchao
Per I know: 1.zookeeper is sensitive to resources(Memory, Disk, CPU, NetWork). If there is some underprovisioning on server, then a) Server may not respond to client requests in time. b) Client assumes server is down, closes the socket and it connects to other server. 2. Hbase is sensitive to RA

Re: Thrift Autogen: byte[] vs ByteBuffer

2011-05-31 Thread Matthew Ward
Good catch! Thanks. On May 31, 2011, at 5:55 PM, Ted Dunning wrote: >0.5.0 > > > On Tue, May 31, 2011 at 5:54 PM, Matthew Ward wrote: > >> $ thrift -version >> Thrift version 0.6.0 >> >> Not sure about the Hbase Dependency. >> >> On May 31, 2011, at 5:45 PM, Ted Dunning wrote: >> >>> W

Re: Thrift Autogen: byte[] vs ByteBuffer

2011-05-31 Thread Ted Dunning
0.5.0 On Tue, May 31, 2011 at 5:54 PM, Matthew Ward wrote: > $ thrift -version > Thrift version 0.6.0 > > Not sure about the Hbase Dependency. > > On May 31, 2011, at 5:45 PM, Ted Dunning wrote: > > > Which versions of thrift are involved here? This sounds like a Thrift > > version mismatc

Re: Thrift Autogen: byte[] vs ByteBuffer

2011-05-31 Thread Ted Dunning
Yes. You have a version problem with Thrift. >From the 0.6.0 release notes for Thrift: THRIFT-830 Java Switch binary field implementation from byte[] to ByteBuffer (Bryan Duxbury) If you look at THRIFT-830 you will see the trenchant

Re: Thrift Autogen: byte[] vs ByteBuffer

2011-05-31 Thread Matthew Ward
$ thrift -version Thrift version 0.6.0 Not sure about the Hbase Dependency. On May 31, 2011, at 5:45 PM, Ted Dunning wrote: > Which versions of thrift are involved here? This sounds like a Thrift > version mismatch. > > What does [thrift -version] say? What is the hbase dependency? > > On Tu

Re: Thrift Autogen: byte[] vs ByteBuffer

2011-05-31 Thread Ted Dunning
Which versions of thrift are involved here? This sounds like a Thrift version mismatch. What does [thrift -version] say? What is the hbase dependency? On Tue, May 31, 2011 at 5:32 PM, Matthew Ward wrote: > The issue I am encountering is that the code generated doing 'thrift --gen > java Hbase

Re: How to efficiently join HBase tables?

2011-05-31 Thread Jason Rutherglen
> I'd imagine that join operations do not require realtime-ness, and so > faster batch jobs using Hive -> frozen HBase files in HDFS could be > the optimal way to go? In addition to lessening the load on the perhaps live RegionServer. There's no Jira for this, I'm tempted to open one. On Tue, May

Re: How to efficiently join HBase tables?

2011-05-31 Thread Bill Graham
We use Pig to join HBase tables using HBaseStorage which has worked well. If you're using HBase >= 0.89 you'll need to build from the trunk or the Pig 0.8 branch. On Tue, May 31, 2011 at 5:18 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > > The Hive-HBase integration allows you to c

Re: Thrift Autogen: byte[] vs ByteBuffer

2011-05-31 Thread Matthew Ward
The issue I am encountering is that the code generated doing 'thrift --gen java Hbase.thrift' outputs code utilizing the 'ByteBuffer' type instead of 'bytes[]'. All the code in org.apache.hadoop.hbase.thrift utilizes byte[]. So basically the code generated via thrift is incompatible with the cur

Re: Thrift Autogen: byte[] vs ByteBuffer

2011-05-31 Thread Ted Dunning
This may help: http://download.oracle.com/javase/1,5.0/docs/api/java/nio/ByteBuffer.html#array() What is it you are actually trying to do? On Tue, May 31, 2011 at 5:14 PM, Matthew Ward wrote: > Hello, > > > I a

Re: How to efficiently join HBase tables?

2011-05-31 Thread Jason Rutherglen
> The Hive-HBase integration allows you to create Hive tables that are backed > by HBase In addition, HBase can be made to go faster for MapReduce jobs, if the HFile's could be used directly in HDFS, rather than proxying through the RegionServer. I'd imagine that join operations do not require re

Thrift Autogen: byte[] vs ByteBuffer

2011-05-31 Thread Matthew Ward
Hello, I am trying to autogen some code off of 90.3. I made some custom additions to our thrift server, however the code that gets generated uses ByteBuffers as opposed to byte[]. How can I get around having to manually add to the autogen code to match? Is there a thrift flag or different serv

Re: wrong region exception

2011-05-31 Thread Stack
So, what about this new WrongRegionException in the new cluster. Can you figure how it came about? In the new cluster, is there also a hole? Did you start the new cluster fresh or copy from old cluster? St.Ack On Tue, May 31, 2011 at 1:55 PM, Robert Gonzalez wrote: > Yeah, we learned the hard

Re: wrong region exception

2011-05-31 Thread Stack
On Tue, May 31, 2011 at 3:34 PM, Robert Gonzalez wrote: > The script doesn't work because it attempts to fix the hole by finding a > region in the hdfs filesystem that fills the hole.  But in this case there is > no such file.  The hole is just there. > OK. The fixup method has the left and ri

RE: wrong region exception

2011-05-31 Thread Robert Gonzalez
The script doesn't work because it attempts to fix the hole by finding a region in the hdfs filesystem that fills the hole. But in this case there is no such file. The hole is just there. -Original Message- From: Robert Gonzalez [mailto:robert.gonza...@maxpointinteractive.com] Sent: T

Re: ANN: HBase 0.90.3 available for download

2011-05-31 Thread Andrew Purtell
> From: Jack Levin > Hello, is there a git repo URL I could use to check out that > code version? git://git.apache.org/hbase.git or git://github.com/apache/hbase.git or https://github.com/apache/hbase.git Then checkout tag '0.90.3'

Re: ANN: HBase 0.90.3 available for download

2011-05-31 Thread Jean-Daniel Cryans
It's all under the tags in the github repo: https://github.com/apache/hbase J-D On Tue, May 31, 2011 at 4:06 PM, Jack Levin wrote: > Hello, is there a git repo URL I could use to check out that code version? > > -Jack > > On Thu, May 19, 2011 at 2:35 PM, Stack wrote: >> The Apache HBase team is

Re: ANN: HBase 0.90.3 available for download

2011-05-31 Thread Jack Levin
Hello, is there a git repo URL I could use to check out that code version? -Jack On Thu, May 19, 2011 at 2:35 PM, Stack wrote: > The Apache HBase team is happy to announce that HBase 0.90.3 is > available from the Apache mirror of choice: > >  http://www.apache.org/dyn/closer.cgi/hbase/ > > HBas

RE: wrong region exception

2011-05-31 Thread Robert Gonzalez
The script ran without the previous problem, but it did not fix the problem. When I ran hbck or check_meta.rb again they indicated that the problem was still there. Do I need to do something else in preparation before running check_meta? Thanks, Robert -Original Message- From: sain

RE: wrong region exception

2011-05-31 Thread Robert Gonzalez
Yeah, we learned the hard way early last year to follow the guidelines religiously. I've gone over the requirements and checked off everything. We even re-did our tables to only have 4 column families, down from 4x that amount. We are at a loss to find out why we seemed to be cursed when it

RE: HFile.Reader scans return latest version?

2011-05-31 Thread Sandy Pratt
Thanks for the pointers. The damage manifested as scanners skipping over a range in our time series data. We knew from other systems that there should be some records in that region that weren't returned. When we looked closely we saw an extremely improbable jump in rowkeys that should by eve

Re: How to efficiently join HBase tables?

2011-05-31 Thread Patrick Angeles
On Tue, May 31, 2011 at 3:19 PM, Eran Kutner wrote: > For my need I don't really need the general case, but even if I did I think > it can probably be done simpler. > The main problem is getting the data from both tables into the same MR job, > without resorting to lookups. So without the theoret

RE: How to efficiently join HBase tables?

2011-05-31 Thread Michael Segel
> From: doug.m...@explorysmedical.com > To: user@hbase.apache.org > Date: Tue, 31 May 2011 15:39:14 -0400 > Subject: RE: How to efficiently join HBase tables? > > Re: " Didn't see a multi-get... " > > This is what I'm talking about... > http://hbase.apache.org/apidocs/org/apache/hadoop/hbas

Re: about disposing Hbase process

2011-05-31 Thread Stack
Sorry Gao, what is your question? St.Ack 2011/5/31 Gaojinchao : > For one our application, There is 3 node. > All process disposing and machine configure is as below. > > Who has experience about this? > > The use rate of cpu is about 70%~80%, Does it make HBase or zookeeper starve? > > > > Machin

Re: How to efficiently join HBase tables?

2011-05-31 Thread Ted Dunning
Your mapper can tell which file is being read and add source tags to the data records. The reducer can do the cartesian product (if you really need that). On Tue, May 31, 2011 at 12:19 PM, Eran Kutner wrote: > For my need I don't really need the general case, but even if I did I think > it can

Re: HFile.Reader scans return latest version?

2011-05-31 Thread Stack
On Tue, May 31, 2011 at 11:05 AM, Sandy Pratt wrote: > Hi all, > > I'm doing some work to read records directly from the HFiles of a damaged > table.  When I scan through the records in the HFile using > org.apache.hadoop.hbase.io.hfile.HFileScanner, will I get only the latest > version of the

RE: How to efficiently join HBase tables?

2011-05-31 Thread Michael Segel
Eran, As I said... if you want to do relational database work, you should use a relational database. The big problem with HBase is that outside of the key, you don't have indexes. You asked a very general question and we have to assume a general case when we are looking at creating a solution.

Re: wrong region exception

2011-05-31 Thread Stack
On Tue, May 31, 2011 at 10:42 AM, Robert Gonzalez wrote: > Now I'm getting the wrong region exception on the new table that I'm copying > the old table to.  Running hbck reveals an inconsistency in the new table.   > The frustration is unbelievable.  Like I said before, it doesn't appear that >

Re: wrong region exception

2011-05-31 Thread Stack
Try adding this change: Index: bin/check_meta.rb === --- bin/check_meta.rb (revision 1129468) +++ bin/check_meta.rb (working copy) @@ -127,11 +127,13 @@ scan = Scan.new() scanner = metatable.getScanner(scan) oldHRI = nil -bad =

RE: How to efficiently join HBase tables?

2011-05-31 Thread Doug Meil
Re: " Didn't see a multi-get... " This is what I'm talking about... http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#get%28java.util.List%29 re: " not sure it would buy you much." Let's say you did these in groups of 500. Although the reads still obviously need to

Re: How to efficiently join HBase tables?

2011-05-31 Thread Eran Kutner
For my need I don't really need the general case, but even if I did I think it can probably be done simpler. The main problem is getting the data from both tables into the same MR job, without resorting to lookups. So without the theoretical MutliTableInputFormat, I could just copy all the data fro

Re: How to efficiently join HBase tables?

2011-05-31 Thread Ted Dunning
The Cartesian product often makes an honest-to-god join not such a good idea on large data. The common alternative is co-group which is basically like doing the hard work of the join, but involves stopping just before emitting the cartesian product. This allows you to inject whatever cleverness y

RE: How to efficiently join HBase tables?

2011-05-31 Thread Michael Segel
Doug, I read the OP's post as the following: "> Hi, > I need to join two HBase tables. The obvious way is to use a M/R job for that. The problem is that the few references to that question I found recommend pulling one table to the mapper and then do a lookup for the referred row in the seco

Re: How to efficiently join HBase tables?

2011-05-31 Thread Jason Rutherglen
Doesn't Hive for HBase enable joins? On Tue, May 31, 2011 at 5:06 AM, Eran Kutner wrote: > Hi, > I need to join two HBase tables. The obvious way is to use a M/R job for > that. The problem is that the few references to that question I found > recommend pulling one table to the mapper and then do

Re: How to efficiently join HBase tables?

2011-05-31 Thread Eran Kutner
Thanks everyone for the great feedback. I'll try to address all the suggestions. My data sets go between large and very large. One is in the order of many billions of rows, although the input for a typical MR job will be in the hundreds of millions, the second table is in the tens of millions. I d

Re: A sudden msg of "java.io.IOException: Server not running, aborting"

2011-05-31 Thread Jean-Daniel Cryans
Can you post the full log somewhere? You talk about several Exceptions but we can't see them. J-D On Tue, May 31, 2011 at 4:41 AM, bijieshan wrote: > It occurred in an RegionServer with an un-known reason. I have check this > RegionServer logs,  there's no prev-aborting,  and no other infos sho

RE: wrong region exception

2011-05-31 Thread Robert Gonzalez
Now I'm getting the wrong region exception on the new table that I'm copying the old table to. Running hbck reveals an inconsistency in the new table. The frustration is unbelievable. Like I said before, it doesn't appear that HBase is ready for prime time. I don't know how companies are usi

HFile.Reader scans return latest version?

2011-05-31 Thread Sandy Pratt
Hi all, I'm doing some work to read records directly from the HFiles of a damaged table. When I scan through the records in the HFile using org.apache.hadoop.hbase.io.hfile.HFileScanner, will I get only the latest version of the record as with a default HBase Scan? Or do I need to do some wo

Re: Starting Hadoop/HBase cluster on Rackspace

2011-05-31 Thread Ryan Rawson
Rackspace doesn't have an API, so no. This is one of the primary disadvantages of rackspace, its all hands on/manual. Just boot up your instances and use the standard management tools. On Tue, May 31, 2011 at 10:23 AM, Something Something wrote: > Hello, > > Are there scripts available to creat

Re: Harvesting empty regions

2011-05-31 Thread Jean-Daniel Cryans
> hbase noob question: do compactions (major/minor) always work in the > scope of a region but they don't do region merges? That's what HBASE-1621 is about, merges can't be done while the cluster is running and compactions only happen when hbase is running. J-D

Re: Harvesting empty regions

2011-05-31 Thread Arvind Jayaprakash
On May 31, Ferdy Galema wrote: >You can use the merge tool to combine adjacent regions. It requires a >bit of manual work because you need to specify the regions by hand. The >cluster also needs to be offline (I recommend to keep zookeeper running >though). Check if merging succeeded with the hb

Starting Hadoop/HBase cluster on Rackspace

2011-05-31 Thread Something Something
Hello, Are there scripts available to create a HBase cluster on Rackspace - like there are for Amazon EC2? A quick Google search didn't come up with anything useful. Any help in this regard would be greatly appreciated. Thanks. - Ajay

RE: wrong region exception

2011-05-31 Thread Robert Gonzalez
I'm trying my "nuclear" option: basically copy the data from the old db to a new one, skipping over bad regions. The bad news is that it is taking forever. I get a stack trace just trying to run check_meta.rb: maxpoint@c1-m02:/usr/lib/hbase/bin$ ./hbase org.jruby.Main check_meta.rb Writables.

RE: How to efficiently join HBase tables?

2011-05-31 Thread Doug Meil
Eran's observation was that a join is solvable in a Mapper via lookups on a 2nd HBase table, but it might not be that efficient if the lookups are 1 by 1. I agree with that. My suggestion was to use multi-Get for the lookups instead. So you'd hold onto a batch of records in the Mapper and the

RE: How to efficiently join HBase tables?

2011-05-31 Thread Michael Segel
Maybe I'm missing something... but this isn't a hard problem to solve. Eran wants to join two tables. If we look at an SQL Statement... SELECT A.*, B.* FROM A, B WHERE A.1 = B.1 AND A.2 = B.2 AND A.3 = xxx AND A.4 = yyy AND B.45 = zzz Or something along those lines. So what you're essential

RE: How to efficiently join HBase tables?

2011-05-31 Thread Doug Meil
Re: "The problem is that the few references to that question I found recommend pulling one table to the mapper and then do a lookup for the referred row in the second table." With multi-get in .90.x you could perform some reasonably clever processing and not do the lookups one-by-one but in b

RE: How to efficiently join HBase tables?

2011-05-31 Thread Michael Segel
Eran, You want to join two tables? The short answer is to use a relational database to solve that problem. Longer answer: You're using HBase so you don't need to think in terms of a reducer. You can create a temp table for your query. You can then run one map job to scan and filter table A, d

Re: How to efficiently join HBase tables?

2011-05-31 Thread Eran Kutner
MutipleInputs would be ideal, but that seems pretty complicated. MultiTableInputFormat seems like a simple change in the getSplits() method of TableInputFormat + support for a collection of table and their matching scanners instead of a single table and scanner, doesn't sound too complicated. Any o

Re: How to efficiently join HBase tables?

2011-05-31 Thread Ferdy Galema
As far as I can tell there is not yet a build-in mechanism you can use for this. You could implement your own InputFormat, something like MultiTableInputFormat. If you need different map functions for the two tables, perhaps something similar to Hadoop's MultipleInputs should do the trick. On

How to efficiently join HBase tables?

2011-05-31 Thread Eran Kutner
Hi, I need to join two HBase tables. The obvious way is to use a M/R job for that. The problem is that the few references to that question I found recommend pulling one table to the mapper and then do a lookup for the referred row in the second table. This sounds like a very inefficient way to do

Re: Harvesting empty regions

2011-05-31 Thread Ferdy Galema
You can use the merge tool to combine adjacent regions. It requires a bit of manual work because you need to specify the regions by hand. The cluster also needs to be offline (I recommend to keep zookeeper running though). Check if merging succeeded with the hbck tool. There are some jira issu

A sudden msg of "java.io.IOException: Server not running, aborting"

2011-05-31 Thread bijieshan
It occurred in an RegionServer with an un-known reason. I have check this RegionServer logs, there's no prev-aborting, and no other infos showed the RegionServer has aborted. So I saw the following msg showed in a sudden. >>[logs] 2011-05-25 09:15:44,232 INFO org.apache.hadoop.hbase.regionser

about disposing Hbase process

2011-05-31 Thread Gaojinchao
For one our application, There is 3 node. All process disposing and machine configure is as below. Who has experience about this? The use rate of cpu is about 70%~80%, Does it make HBase or zookeeper starve? Machine: cpu:8 core 2.GHz memory: 48G Disk: 2T*8 = 16T Node1: DataNode HAJob