Thanks for the quick response Ted. It does look like HBASE-4508 would resolve the issue I'm seeing, so I appreciate the pointer. We're back up and running ok for now with a custom TableOutputFormat (and custom TableInputFormat) to work around this issue, but when we're ready to move to a later version of HBase we'll give that new mechanism a try.
Thanks! -Shawn On Tue, Jan 24, 2012 at 2:39 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Thanks for reporting this, Shawn. > > Do you want to try out HBASE-4508 which is in HBase 0.90.5 ? > > On Tue, Jan 24, 2012 at 9:15 AM, Shawn Quinn <squ...@moxiegroup.com> > wrote: > > > Hello, > > > > Our application runs Map/Reduce tasks fairly frequently against HBase > > (Cloudera distribution 0.90.4), and we're making using of the default > > org.apache.hadoop.hbase.mapreduce.TableOutputFormat class for the reduce > > step which the TableMapReduceUtil.initTableReducerJob() sets up. We > invoke > > the Map/Reduce tasks via the standard Hadoop Job API, but they're all > > triggered from the same virtual machine that stays running (so we aren't > > shutting down the virtual machine after each job runs). We've been > > noticing that we've been running out of ZooKeeper connections in this > > configuration, and believe we've tracked the "leak" down to the > > TableOutputFormat class. Specifically, that class does the following: > > > > public void setConf(Configuration otherConf) { > > this.conf = HBaseConfiguration.create(otherConf); > > String tableName = this.conf.get(OUTPUT_TABLE); > > String address = this.conf.get(QUORUM_ADDRESS); > > String serverClass = this.conf.get(REGION_SERVER_CLASS); > > String serverImpl = this.conf.get(REGION_SERVER_IMPL); > > try { > > if (address != null) { > > ZKUtil.applyClusterKeyToConf(this.conf, address); > > } > > if (serverClass != null) { > > this.conf.set(HConstants.REGION_SERVER_CLASS, serverClass); > > this.conf.set(HConstants.REGION_SERVER_IMPL, serverImpl); > > } > > this.table = new HTable(this.conf, tableName); > > this.table.setAutoFlush(false); > > LOG.info("Created table instance for " + tableName); > > } catch(IOException e) { > > LOG.error(e); > > } > > } > > > > I believe in previous releases of HBase this was different, but at some > > point the code to clone the configuration object (first line of that > > method) was added. Then, in that same method when that code creates the > > HTable instance, internally the HTable gets a new connection to ZooKeeper > > everytime (since the configuration instance is different.) > > > > I believe I can get around this in my application by creating a custom > > TableOutputFormat. However, can anyone confirm if this is indeed a > > problem, or if there is some other way to work around the default > > TableOutputFormat class creating a new connection to ZooKeeper every time > > it runs? > > > > Thanks, > > > > -Shawn > > >