Hi Wayne,

0.90.0 is out. Get it while it's hot from the HBase home page. 

Lars

On Jan 21, 2011, at 20:22, Wayne <wav...@gmail.com> wrote:

> I enthusiastically created a ticket:
> https://issues.apache.org/jira/browse/HBASE-3463
> 
> This might be a dumb question I should already know the answer to...but when
> is .90 coming out and what is its current state? Isn't there an RC
> out? We are on 0.89.20100924 and thought that was the latest for us to work
> off of...
> 
> As always thanks for the detailed responses. FYI: I ended up reformatting as
> I could drop all tables and get phantom regions to go away after several
> restarts but the .META. table was still stuck reporting as 150MB with no
> tables and after issuing major_compact (which never seemed to have any
> affect)...
> 
> Thanks.
> 
> 
> On Fri, Jan 21, 2011 at 1:47 PM, Stack <st...@duboce.net> wrote:
> 
>> On Fri, Jan 21, 2011 at 4:51 AM, Wayne <wav...@gmail.com> wrote:
>>> After several hours I have figured out how to get the Disable command to
>>> work and how to delete manually, but in the process there are 4 problems
>> I
>>> encountered that I think are areas that could be improved (or my
>>> understanding improved).
>>> 
>>> 1) The client timeout is used for the disable command which was my
>> problem.
>>> Does this totally make sense? Should a DML minded timeout be used for DDL
>>> statements that we know can take a very long time normally with a large
>>> cluster?
>>> 
>> 
>> Sorry Wayne.  I meant to respond yesterday to your original query.
>> 
>> Enable/Disable has been redone in 0.90.  Now there are added
>> enabling/disabling states that are maintained up in zk and in shell
>> there are commands is_enabled and is_disabled.  We still have the same
>> (DML) timeout (sortof -- see below for more) but at least now if it
>> times out, you are not hosed.  The disable or enable process is still
>> running and you can query its state.  There is also notion of async
>> enable/disable though this latter facility is not exposed in shell,
>> only in the HBaseAdmin API.
>> 
>> 
>>> 2) If the disable command fails the first time it does not "roll back".
>> The
>>> ONLY way to proceed is to enable and then try to disable again. The first
>>> disable attempt is all that seems to work. Subsequent disable statements
>>> usually work without errors but never seem to "work". The entire table
>>> should be disabled after issuing this command or the entire table should
>>> still be enabled. I was caught in this half disabled or mostly disabled
>>> which was very frustating.
>>> 
>> 
>> Sorry about that.   Should be better in 0.90.0.
>> 
>> Things should run a bit faster in 0.90.0 too because disable used to
>> include an update of .META. per region plus a close of all regions
>> that make up the table.  In 0.90.0 there is no longer the .META.
>> update and close is more prompt now; in the past close would wait on
>> any running compactions to complete before proceeding.  In 0.90.0
>> we'll no interrupt the running compaction so close happens the sooner.
>> 
>> There is room for a bunch more improvement. For example, deleting a
>> table, there should be short-circuit that punts on flush of in-memory
>> state and clean-close of open regions.
>> 
>>> 3) The biggest issue of all is why certain regions do not report back to
>> the
>>> disable command. What are the various states of a region that could cause
>>> this? Compaction I know is one, what else could cause the disable command
>> to
>>> take too long? Shouldn't a disable force itself through and wait long
>> enough
>>> to be able to disable every region? Again a long wait time or a more
>>> forceful operation would help.
>>> 
>> 
>> It wasn't that smart in 0.20/0.89.  Its still pretty dumb but better in
>> 0.90.0.
>> 
>> Master process runs the enable/disable process in both old and new
>> HBase.  In 0.20/0.89, it was a sync process w/ master waiting on
>> regions to flip to 'offline' after successful close.  The state of
>> disabledness was when all regions in table had 'offline' state.  Any
>> hiccup, a problem closing or a failure to update .META. w/ offline per
>> region would bork the disabling process.  It was super fragile.  We
>> tried to talk it up as so.
>> 
>> In 0.90, client queues in master an executor that flips table to
>> disabling in zk and then in parallel sends out unassigns of all table
>> regions.  The executor then hangs around with a more DDL-like timeout
>> of hbase.bulk.assignment.waiton.empty.rit (10minutes by default).
>> Meantime clients can check state of the disable.   After all unassigns
>> complete, the table is flipped to disabled.
>> 
>> 
>>> 4) Through all of the attempts to disable I saw regions coming and going
>> and
>>> nothing was consistent. The UI showed the table as disabled and listed 1
>>> region in the table (there were 1000s). The node view listed several
>> other
>>> regions but not the same one as the table view. It was a very strange
>>> situation. The UI to browse the tables and regions is great but it would
>> be
>>> even better if it gave a 100% view of regions and their current states. A
>>> summary view of region counts per table based on state or status would be
>>> fantastic.
>> 
>> Please file a JIRA.  Sounds like good idea.  We could hoist stuff up
>> out of hbck tool up into UI.
>> 
>> 
>>> There is a compaction count, but what about in split, read/rite
>>> lock, disabled, etc. What is the precise list of regions states that
>> could
>>> occur and show a summary count per state as well as detailed state for
>> each
>>> specific region in the list. Fundamentally this is the health monitor of
>> the
>>> system and as a dba I really need to know the 100% count of regions and
>>> where they are all at in terms of availability. Are they disabled,
>> blocked
>>> for writes, blocked for reads, in compaction, etc. etc. If there are
>> various
>>> states that cause disabling to be blocked it can be reported here so that
>> I
>>> at least know when a disable command can be executed successfully (and
>> this
>>> should be documented).
>>> 
>> 
>> 
>> Please file a JIRA.  This is great stuff.
>> 
>> Sorry for pain caused messing w/ broke enable/disable.  It should be
>> better in 0.90 and easier to fix if bugs.
>> 
>> St.Ack
>> 
>> 
>>> Thanks
>>> 
>>> On Thu, Jan 20, 2011 at 9:01 PM, Wayne <wav...@gmail.com> wrote:
>>> 
>>>> I need to delete some tables and I am not sure the best way to do it.
>> The
>>>> shell does not work. The disable command says it runs ok but every time
>> I
>>>> run drop or truncate I get an exception that says the table is not
>>>> disabled.  The UI shows it as disabled but truncate/drop still do not
>> work.
>>>> I have even tried to restart the cluster as sometimes that makes the
>> disable
>>>> "stick".
>>>> 
>>>> What is the best way to delete a table manually? My assumption is that
>> with
>>>> 10k regions in 3 tables that I need to delete that the shell is not
>> going to
>>>> work. How can I do this without a completely fresh install of
>> everything?
>>>> How can the data/tables be removed manually without too much pain?
>>>> 
>>>> Thanks.
>>>> 
>>> 
>> 

Reply via email to