RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Michael.Grundvig Sun, 03 Nov 2013 21:53:23 -0800

Our current usage is how I would do this in a typical database app with table 
acting like a statement. It looks like this:


Connection connection = null;
HTableInterface table = null;
try {
        connection = pool.acquire();
        table = connection.getTable(tableName);
        // Do work
} finally {
        table.close();
        pool.release(connection);
}

Is this incorrect? The API docs says close " Releases any resources held or 
pending changes in internal buffers." I didn't interpret that as having it 
close the underlying connection. Thanks!

-Mike

-----Original Message-----
From: Sriram Ramachandrasekaran [mailto:[email protected]] 
Sent: Sunday, November 03, 2013 11:43 PM
To: [email protected]
Cc: [email protected]
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Hey Michael - Per API documentation, closing the HTable Instance would close 
the underlying resources too. Hope you are aware of it.


On Mon, Nov 4, 2013 at 11:06 AM, <[email protected]> wrote:

> Hi Lars, at application startup the pool is created with X number of 
> connections using the first method you indicated:
> HConnectionManager.createConnection(conf). We store each connection in 
> the pool automatically and serve it up to threads as they request it. 
> When a thread is done using the connection, they return it back to the 
> pool. The connections are not be created and closed per thread, but 
> only once for the entire application. We are using the 
> GenericObjectPool from Apache Commons Pooling as the foundation of our 
> connection pooling approach. Our entire pool implementation really 
> consists of just a couple overridden methods to specify how to create 
> a new connection and close it. The GenericObjectPool class does all the rest. 
> See here for details:
> http://commons.apache.org/proper/commons-pool/
>
> Each thread is getting a HTableInstance as needed and then closing it 
> when done. The only thing we are not doing is using the 
> createConnection method that takes in an ExecutorService as that 
> wouldn't work in our model. Our app is like a web application - the 
> thread pool is managed outside the scope of our application code so we 
> can't assume the service is available at connection creation time. Thanks!
>
> -Mike
>
>
> -----Original Message-----
> From: lars hofhansl [mailto:[email protected]]
> Sent: Sunday, November 03, 2013 11:27 PM
> To: [email protected]
> Subject: Re: HBase Client Performance Bottleneck in a Single Virtual 
> Machine
>
> Hi Micheal,
>
> can you try to create a single HConnection in your client:
> HConnectionManager.createConnection(Configuration conf) or 
> HConnectionManager.createConnection(Configuration conf, 
> ExecutorService
> pool)
>
> Then use HConnection.getTable(...) each time you need to do an operation.
>
> I.e.
> Configuration conf = ...;
> ExecutorService pool = ...;
> // create a single HConnection for you vm.
> HConnection con = HConnectionManager.createConnection(Configuration 
> conf, ExecutorService pool); // reuse the connection for many tables, 
> even in different threads HTableInterface table = con.getTable(...); 
> // use table even for only a few operation.
> table.close();
> ...
> HTableInterface table = con.getTable(...); // use table even for only 
> a few operation.
> table.close();
> ...
> // at the end close the connection
> con.close();
>
> -- Lars
>
>
>
> ________________________________
>  From: "[email protected]" 
> <[email protected]>
> To: [email protected]
> Sent: Sunday, November 3, 2013 7:46 PM
> Subject: HBase Client Performance Bottleneck in a Single Virtual 
> Machine
>
>
> Hi all; I posted this as a question on StackOverflow as well but 
> realized I should have gone straight ot the horses-mouth with my 
> question. Sorry for the double post!
>
> We are running a series of HBase tests to see if we can migrate one of 
> our existing datasets from a RDBMS to HBase. We are running 15 nodes 
> with 5 zookeepers and HBase 0.94.12 for this test.
>
> We have a single table with three column families and a key that is 
> distributing very well across the cluster. All of our queries are 
> running a direct look-up; no searching or scanning. Since the 
> HTablePool is now frowned upon, we are using the Apache commons pool 
> and a simple connection factory to create a pool of connections and 
> use them in our threads. Each thread creates an HTableInstance as 
> needed and closes it when done. There are no leaks we can identify.
>
> If we run a single thread and just do lots of random calls 
> sequentially, the performance is quite good. Everything works great 
> until we start trying to scale the performance. As we add more threads 
> and try and get more work done in a single VM, we start seeing 
> performance degrade quickly. The client code is simply attempting to 
> run either one of several gets or a single put at a given frequency. 
> It then waits until the next time to run, we use this to simulate the 
> workload from external clients. With a single thread, we will see call times 
> in the 2-3 milliseconds which is acceptable.
>
> As we add more threads, this call time starts increasing quickly. What 
> gets strange is if we add more VMs, the times hold steady across them 
> all so clearly it's a bottleneck in the running instance and not the cluster.
> We can get a huge amount of processing happening across the cluster 
> very easily - it just has to use a lot of VMs on the client side to do 
> it. We know the contention isn't in the connection pool as we see the 
> problem even when we have more connections than threads. 
> Unfortunately, the times are spiraling out of control very quickly. We 
> need it to support at least 128 threads in practice, but most 
> important I want to support 500 updates/sec and 250 gets/sec. In 
> theory, this should be a piece of cake for the cluster as we can do 
> FAR more work than that with a few VMs, but we don't even get close to this 
> with a single VM.
>
> So my question: how do people building high-performance apps with 
> HBase get around this? What approach are others using for connection 
> pooling in a multi-threaded environment? There seems to be a 
> surprisingly little amount of info about this on the web considering 
> the popularity. Is there some client setting we need to use that makes 
> it perform better in a threaded environment? We are going to try to 
> cache HTable instances next but that's a total guess. There are 
> solutions to offloading work to other VMs but we really want to avoid 
> this as clearly the cluster can handle the load and it will dramatically 
> decrease the application performance in critical areas.
>
> Any help is greatly appreciated! Thanks!
> -Mike
>



--
It's just about how deep your longing is!

RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Reply via email to