Good point, but I thought the stock HBase did not have this problem? I did a quick test and it seems to start up fine.
On Tue, Jul 12, 2011 at 4:52 PM, Jim R. Wilson <wilson.ji...@gmail.com>wrote: > This thread's title refers to CDH - would the proposed solutions also allow > running stock HBase 0.90.3? > > So far I've been using Whirr 0.5.0. I'll go get the latest from the repo, > at which time I'll be available to help with patches. > > -- Jim R. Wilson (jimbojw) > > > On Tue, Jul 12, 2011 at 9:32 AM, Bruno Dumon <br...@outerthought.org>wrote: > >> On Tue, Jul 12, 2011 at 11:22 AM, Bruno Dumon <br...@outerthought.org>wrote: >> >>> On Mon, Jul 11, 2011 at 6:10 PM, Andrei Savu <savu.and...@gmail.com>wrote: >>> >>>> On Mon, Jul 11, 2011 at 8:03 AM, Tom White <t...@cloudera.com> wrote: >>>> > That would be great. The issues is assigned to Andrei, so it would be >>>> > worth seeing if he's currently working on it. >>>> > >>>> >>>> Unfortunately I'm quite busy now and I'm not going to be able to work >>>> on that. Feel free to join! :) >>>> >>> >>> ok, cool. In the meantime I thought of first looking into making it work >>> with a wait, see below. >>> >>> >>>> >>>> > Yes, I think WHIRR-334 could go in if tests for other services that >>>> > may be affected still pass. >>>> >>>> I will test that and commit if everything is fine. How about adding a >>>> delay (sleep) to the configure_cdh_hbase before "service >>>> hadoop-hbase-master start" and "service hadoop-hbase-regionserver >>>> start". That should make the start-up process a bit more predictable >>>> (most of the time) until we have a good alternative in place. >>>> >>>> >>> I tried to do something more reliable than a simple sleep by adding the >>> following loop to configure_cdh_hbase.sh, before starting the services. It >>> sleeps until it can connect to the namenode web ui: >>> >>> until wget $MASTER_HOST:50070 -O /dev/null -o /dev/null >>> do >>> echo "hbase start: waiting for namenode to be available -- $(date)" >>> sleep 2 >>> done >>> >>> >>> It did not quite work as I expected, since (afaics) the instances for >>> different templates are not handled concurrently. >>> >>> I use the following templates: >>> >>> whirr.instance-templates=1 >>> zookeeper+hadoop-namenode+hadoop-jobtracker+hbase-master,1 >>> hadoop-datanode+hadoop-tasktracker+hbase-regionserver >>> >>> and on each run I tried, it always first runs the configure for the >>> template involving the regionserver. Because of the until-loop I added, the >>> configure script actually never finishes, but somewhere it is decided after >>> 10 minutes that it is done (while it's not), and then Whirr runs the >>> configure script on the other node. Once that's done, the until-loop >>> finishes and the regionserver starts successfully. >>> >>> So it seems like this approach would work, if we added concurrent >>> execution of the configure script for different templates. Seems like this >>> is also part of the WHIRR-221 (startup order) patch. Maybe it makes sense to >>> first integrate that part of that patch? Or do you expect the concurrency >>> might break other things? >>> >> >> Thinking further about this: if the wait-for-namenode-loop would be part >> of HBase, the 'service hadoop-hbase restart' would return as soon as the JVM >> is launched, and not wait for HBase to be completely ready. Following this >> reasoning, we can as well run the script-based loop in the background too. >> I'm trying this out now, I'll update the WHIRR-334 patch when done. >> >> Regardless of this, I think it is still interesting to run the >> configuration phase for all templates in parallel. I have a patch for this >> ready, any interest in this? >> >> >> -- >> Bruno Dumon >> Outerthought >> http://outerthought.org/ >> > > -- Bruno Dumon Outerthought http://outerthought.org/