Hi Ranjith,
Your understanding is correct.

I going to answer your 2nd question here. I would say if you don't have a lot 
of concurrent users using Hive (and hence, the metastore) at the same time, 
local relational DB (like MySQL) would work well as your metastore.

If you have more concurrent connections to a Hive metastore, consider using a 
remote metastore. "More" is relative here, but my guess would be "more" means 
more than several tens of users at the same time.

Mark

Mark Grover, Business Intelligence Analyst
OANDA Corporation 

www: oanda.com www: fxtrade.com 
e: [email protected] 

"Best Trading Platform" - World Finance's Forex Awards 2009. 
"The One to Watch" - Treasury Today's Adam Smith Awards 2009. 


----- Original Message -----
From: "Ranjith Raghunath" <[email protected]>
To: [email protected]
Sent: Wednesday, March 7, 2012 11:35:49 PM
Subject: Remote vs Local metastore for hive




I trying to understand this concept a little better and could use some help 
from the larger community. I jotted down a couple of quick notes as I was 
reading through the material of local vs remote: 



1. In local mode, each hive client will invoke a connection to the database. If 
there are several clients connected to the database this could overwhelm the 
instance depending on the max connection parameter set. By default, this value 
is set at 151 (in MySQL) and can be bumped up to large value depending on how 
much ram the box has. 

2. In remote mode, each of the clients go through the metastore service. 



The question here is: 



1. Can each node on the cluster have a separate metastore service when using 
the remote metastore configuration? 

a. If so managing this seems like a nightmare in terms of keeping the logs in 
sync. 

b. This seems to be like a single point of failure as all connections are 
routed through a metastore service. 

2. What is preferred approach here with respect to local vs remote? 

3. In order to avoid overwhelming the database should the following parameters 
be tuned: 

a. hive.metastore.server.min.threads 

b. hive.metastore.server.max.threads 



Thanks, 

Ranjith 


Reply via email to