Good point, but I thought the stock HBase did not have this problem? I did a
quick test and it seems to start up fine.

On Tue, Jul 12, 2011 at 4:52 PM, Jim R. Wilson <wilson.ji...@gmail.com>wrote:

> This thread's title refers to CDH - would the proposed solutions also allow
> running stock HBase 0.90.3?
>
> So far I've been using Whirr 0.5.0.  I'll go get the latest from the repo,
> at which time I'll be available to help with patches.
>
> -- Jim R. Wilson (jimbojw)
>
>
> On Tue, Jul 12, 2011 at 9:32 AM, Bruno Dumon <br...@outerthought.org>wrote:
>
>> On Tue, Jul 12, 2011 at 11:22 AM, Bruno Dumon <br...@outerthought.org>wrote:
>>
>>> On Mon, Jul 11, 2011 at 6:10 PM, Andrei Savu <savu.and...@gmail.com>wrote:
>>>
>>>> On Mon, Jul 11, 2011 at 8:03 AM, Tom White <t...@cloudera.com> wrote:
>>>> > That would be great. The issues is assigned to Andrei, so it would be
>>>> > worth seeing if he's currently working on it.
>>>> >
>>>>
>>>> Unfortunately I'm quite busy now and I'm not going to be able to work
>>>> on that. Feel free to join! :)
>>>>
>>>
>>> ok, cool. In the meantime I thought of first looking into making it work
>>> with a wait, see below.
>>>
>>>
>>>>
>>>> > Yes, I think WHIRR-334 could go in if tests for other services that
>>>> > may be affected still pass.
>>>>
>>>> I will test that and commit if everything is fine. How about adding a
>>>> delay (sleep) to the configure_cdh_hbase before "service
>>>> hadoop-hbase-master start" and "service hadoop-hbase-regionserver
>>>> start". That should make the start-up process a bit more predictable
>>>> (most of the time) until we have a good alternative in place.
>>>>
>>>>
>>> I tried to do something more reliable than a simple sleep by adding the
>>> following loop to configure_cdh_hbase.sh, before starting the services. It
>>> sleeps until it can connect to the namenode web ui:
>>>
>>>   until wget $MASTER_HOST:50070 -O /dev/null -o /dev/null
>>>   do
>>>     echo "hbase start: waiting for namenode to be available -- $(date)"
>>>     sleep 2
>>>   done
>>>
>>>
>>> It did not quite work as I expected, since (afaics) the instances for
>>> different templates are not handled concurrently.
>>>
>>> I use the following templates:
>>>
>>> whirr.instance-templates=1
>>> zookeeper+hadoop-namenode+hadoop-jobtracker+hbase-master,1
>>> hadoop-datanode+hadoop-tasktracker+hbase-regionserver
>>>
>>> and on each run I tried, it always first runs the configure for the
>>> template involving the regionserver. Because of the until-loop I added, the
>>> configure script actually never finishes, but somewhere it is decided after
>>> 10 minutes that it is done (while it's not), and then Whirr runs the
>>> configure script on the other node. Once that's done, the until-loop
>>> finishes and the regionserver starts successfully.
>>>
>>> So it seems like this approach would work, if we added concurrent
>>> execution of the configure script for different templates. Seems like this
>>> is also part of the WHIRR-221 (startup order) patch. Maybe it makes sense to
>>> first integrate that part of that patch? Or do you expect the concurrency
>>> might break other things?
>>>
>>
>> Thinking further about this: if the wait-for-namenode-loop would be part
>> of HBase, the 'service hadoop-hbase restart' would return as soon as the JVM
>> is launched, and not wait for HBase to be completely ready. Following this
>> reasoning, we can as well run the script-based loop in the background too.
>> I'm trying this out now, I'll update the WHIRR-334 patch when done.
>>
>> Regardless of this, I think it is still interesting to run the
>> configuration phase for all templates in parallel. I have a patch for this
>> ready, any interest in this?
>>
>>
>> --
>> Bruno Dumon
>> Outerthought
>> http://outerthought.org/
>>
>
>


-- 
Bruno Dumon
Outerthought
http://outerthought.org/

Reply via email to