There is pooling on very many levels here. :)
HTablePool maintains a pool of HTables per table (good when multiple threads access the same table frequently). When you create HTables from the same Configuration (same as in the same Java reference), the HTables will share the underlying HConnection.(The HTablePool does that too). The HConnection manages your "logical" connection to the cluster. Before the patches mentioned below, regardless of how many HTables or HConnections you have, there will be at most one TCP connection from any client process (i.e. the JVM) to any Regionserver. Normally this is a good thing; unless you have many threads that tend to hit the same regionserver and hence experience the partial serialization problem I described below. -- Lars ps. and one correction: It is not HBaseClient.sendParam, but HBaseClient.Connection.sendParam ________________________________ From: Jimson K. James <[email protected]> To: [email protected] Cc: [email protected] Sent: Tuesday, August 23, 2011 9:28 PM Subject: RE: Multithreaded get Hi Li Pi, What about using HTablePool in CDH3. If there is no connection pooling built in then HTablePool will not be possible I think?? Or am I missing something? -----Original Message----- From: Li Pi [mailto:[email protected]] Sent: Wednesday, August 24, 2011 9:35 AM To: [email protected] Subject: Re: Multithreaded get CDH3 does not have connection pooling. I don't believe any release has those two patches yet. I'll check when it'll hit CDH. Though, you are always free to backport your own patches from trunk. On Tue, Aug 23, 2011 at 8:56 PM, Jimson K. James <[email protected] > wrote: > Hi Lars, > Thank you for writing, > > The existing setup at my disposal is Cloudera CDH3. Do you have any > information about connection pooling in CDH3? > Also the client machine is WinXP, the main concern is the concurrent > connection limit at the TCP/IP stack level. > Of course if there is limitation at the level of JVM itself, the whole > multithreaded app will suffer? > > -----Original Message----- > From: lars hofhansl [mailto:[email protected]] > Sent: Tuesday, August 23, 2011 9:55 PM > To: [email protected] > Subject: Re: Multithreaded get > > The problem is that the requests are to some extend serialized over the > connection. > > (See HBaseClient.sendParam, where is lock is held while network IO is in > progress). > > > HBase Trunk has connection pooling (see HBASE-2939 and HBASE-4150). > > In my tests this sped up requests/sec with multiple threads significantly > (sometimes by a factor of 2 or 3). > > > -- Lars > > > > ________________________________ > From: Srikanth P. Shreenivas <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Monday, August 22, 2011 11:43 PM > Subject: RE: Multithreaded get > > Hi Jimson, > > In my experience, I have observed that as you increase number of threads, > the get/put starts taking more time. > The reason being that same TCP connection is used for all the gets/puts > from a single JVM. All requests are multiplexed on the same connection. > > Hence, your example of gets taking 10ms is function of the minimum amount > of time a single get takes. So, you cannot make it any faster by adding > more threads. > > > I had done some tests in the past with puts. > Here are my observations: > http://www.srikanthps.com/2011/06/hbase-benchmarking-for-multi-threaded. html > > > Regards, > Srikanth > > > > -----Original Message----- > From: Jimson K. James [mailto:[email protected]] > Sent: Tuesday, August 23, 2011 11:40 AM > To: [email protected] > Subject: RE: Multithreaded get > > Hi Li Pi, > > > > Thank you for your quick response. > > > > What I see here is, When we are reading 1000 keys, each key of 1MB data, > from a total number of 5 nodes, one node shows 100% network usage with > data receive and 50% network usage of data transmit from other 3 nodes > (5th being just the name node with a little network traffic). > > Seems like the keys are aggregated onto a node before serving??? There > is no map reduce in question just the plain Get operation. > > Any idea? > > > > Also with the multithread app, the data retrieval speed is showing weird > behavior. > > For example, if a single threaded app took 10 ms to Get 2 rows, then a > two thread app should took 5 ms, but when tested it is taking 10ms. ?? > > > > From: Li Pi [mailto:[email protected]] > Sent: Tuesday, August 23, 2011 9:38 AM > To: [email protected] > Subject: Re: Multithreaded get > > > > Yes. > > > > Even if all keys are on the same region, you'll experience a speedup if > multithreaded. > > > > Sort of relevant: read performance test with differing number of reader > threads based on where the file is cached. > > > > On Mon, Aug 22, 2011 at 9:04 PM, Jimson K. James > <[email protected]> wrote: > > Hi All, > > > > Can anyone confirm that, when a multi threaded application, say with 10 > threads, try to get 10 different keys from 10 different regions spread > over 10 nodes yield 1/10th of the total time taken by a single thread to > fetch the same 10 keys? > > > > Or in other words, > > > > If I get 10 ms for the Get of a single key, then for 10 keys, > 10*10=100ms for single threaded application and > > Approx 10ms for 10 keys in a 10 threaded application? > > > > Will the 10 threads retrieve the 10 keys simultaneously? > > > > The target keys are all 1MB in size and the network speed is 10/100Mbps > lan. > > > > Thanks & Regards, > > Jimson K James > > > > ***** Confidentiality Statement/Disclaimer ***** > > This message and any attachments is intended for the sole use of the > intended recipient. It may contain confidential information. Any > unauthorized use, dissemination or modification is strictly prohibited. > If you are not the intended recipient, please notify the sender > immediately then delete it from all your systems, and do not copy, use > or print. Internet communications are not secure and it is the > responsibility of the recipient to make sure that it is virus/malicious > code exempt. > The company/sender cannot be responsible for any unauthorized > alterations or modifications made to the contents. If you require any > form of confirmation of the contents, please contact the company/sender. > The company/sender is not liable for any errors or omissions in the > content of this message. > > > > ***** Confidentiality Statement/Disclaimer ***** > > This message and any attachments is intended for the sole use of the > intended recipient. It may contain confidential information. Any > unauthorized use, dissemination or modification is strictly prohibited. If > you are not the intended recipient, please notify the sender immediately > then delete it from all your systems, and do not copy, use or print. > Internet communications are not secure and it is the responsibility of the > recipient to make sure that it is virus/malicious code exempt. > The company/sender cannot be responsible for any unauthorized alterations > or modifications made to the contents. If you require any form of > confirmation of the contents, please contact the company/sender. The > company/sender is not liable for any errors or omissions in the content of > this message. > > ________________________________ > > http://www.mindtree.com/email/disclaimer.html > ***** Confidentiality Statement/Disclaimer ***** > > This message and any attachments is intended for the sole use of the > intended recipient. It may contain confidential information. Any > unauthorized use, dissemination or modification is strictly prohibited. If > you are not the intended recipient, please notify the sender immediately > then delete it from all your systems, and do not copy, use or print. > Internet communications are not secure and it is the responsibility of the > recipient to make sure that it is virus/malicious code exempt. > The company/sender cannot be responsible for any unauthorized alterations > or modifications made to the contents. If you require any form of > confirmation of the contents, please contact the company/sender. The > company/sender is not liable for any errors or omissions in the content of > this message. > ***** Confidentiality Statement/Disclaimer ***** This message and any attachments is intended for the sole use of the intended recipient. It may contain confidential information. Any unauthorized use, dissemination or modification is strictly prohibited. If you are not the intended recipient, please notify the sender immediately then delete it from all your systems, and do not copy, use or print. Internet communications are not secure and it is the responsibility of the recipient to make sure that it is virus/malicious code exempt. The company/sender cannot be responsible for any unauthorized alterations or modifications made to the contents. If you require any form of confirmation of the contents, please contact the company/sender. The company/sender is not liable for any errors or omissions in the content of this message.
