It seems there is a typo:
"we'll no interrupt the running compaction" should be "we'll now interrupt
the running compaction"

On Fri, Jan 21, 2011 at 10:47 AM, Stack <st...@duboce.net> wrote:

> On Fri, Jan 21, 2011 at 4:51 AM, Wayne <wav...@gmail.com> wrote:
> > After several hours I have figured out how to get the Disable command to
> > work and how to delete manually, but in the process there are 4 problems
> I
> > encountered that I think are areas that could be improved (or my
> > understanding improved).
> >
> > 1) The client timeout is used for the disable command which was my
> problem.
> > Does this totally make sense? Should a DML minded timeout be used for DDL
> > statements that we know can take a very long time normally with a large
> > cluster?
> >
>
> Sorry Wayne.  I meant to respond yesterday to your original query.
>
> Enable/Disable has been redone in 0.90.  Now there are added
> enabling/disabling states that are maintained up in zk and in shell
> there are commands is_enabled and is_disabled.  We still have the same
> (DML) timeout (sortof -- see below for more) but at least now if it
> times out, you are not hosed.  The disable or enable process is still
> running and you can query its state.  There is also notion of async
> enable/disable though this latter facility is not exposed in shell,
> only in the HBaseAdmin API.
>
>
> > 2) If the disable command fails the first time it does not "roll back".
> The
> > ONLY way to proceed is to enable and then try to disable again. The first
> > disable attempt is all that seems to work. Subsequent disable statements
> > usually work without errors but never seem to "work". The entire table
> > should be disabled after issuing this command or the entire table should
> > still be enabled. I was caught in this half disabled or mostly disabled
> > which was very frustating.
> >
>
> Sorry about that.   Should be better in 0.90.0.
>
> Things should run a bit faster in 0.90.0 too because disable used to
> include an update of .META. per region plus a close of all regions
> that make up the table.  In 0.90.0 there is no longer the .META.
> update and close is more prompt now; in the past close would wait on
> any running compactions to complete before proceeding.  In 0.90.0
> we'll no interrupt the running compaction so close happens the sooner.
>
> There is room for a bunch more improvement. For example, deleting a
> table, there should be short-circuit that punts on flush of in-memory
> state and clean-close of open regions.
>
> > 3) The biggest issue of all is why certain regions do not report back to
> the
> > disable command. What are the various states of a region that could cause
> > this? Compaction I know is one, what else could cause the disable command
> to
> > take too long? Shouldn't a disable force itself through and wait long
> enough
> > to be able to disable every region? Again a long wait time or a more
> > forceful operation would help.
> >
>
> It wasn't that smart in 0.20/0.89.  Its still pretty dumb but better in
> 0.90.0.
>
> Master process runs the enable/disable process in both old and new
> HBase.  In 0.20/0.89, it was a sync process w/ master waiting on
> regions to flip to 'offline' after successful close.  The state of
> disabledness was when all regions in table had 'offline' state.  Any
> hiccup, a problem closing or a failure to update .META. w/ offline per
> region would bork the disabling process.  It was super fragile.  We
> tried to talk it up as so.
>
> In 0.90, client queues in master an executor that flips table to
> disabling in zk and then in parallel sends out unassigns of all table
> regions.  The executor then hangs around with a more DDL-like timeout
> of hbase.bulk.assignment.waiton.empty.rit (10minutes by default).
> Meantime clients can check state of the disable.   After all unassigns
> complete, the table is flipped to disabled.
>
>
> > 4) Through all of the attempts to disable I saw regions coming and going
> and
> > nothing was consistent. The UI showed the table as disabled and listed 1
> > region in the table (there were 1000s). The node view listed several
> other
> > regions but not the same one as the table view. It was a very strange
> > situation. The UI to browse the tables and regions is great but it would
> be
> > even better if it gave a 100% view of regions and their current states. A
> > summary view of region counts per table based on state or status would be
> > fantastic.
>
> Please file a JIRA.  Sounds like good idea.  We could hoist stuff up
> out of hbck tool up into UI.
>
>
> > There is a compaction count, but what about in split, read/rite
> > lock, disabled, etc. What is the precise list of regions states that
> could
> > occur and show a summary count per state as well as detailed state for
> each
> > specific region in the list. Fundamentally this is the health monitor of
> the
> > system and as a dba I really need to know the 100% count of regions and
> > where they are all at in terms of availability. Are they disabled,
> blocked
> > for writes, blocked for reads, in compaction, etc. etc. If there are
> various
> > states that cause disabling to be blocked it can be reported here so that
> I
> > at least know when a disable command can be executed successfully (and
> this
> > should be documented).
> >
>
>
> Please file a JIRA.  This is great stuff.
>
> Sorry for pain caused messing w/ broke enable/disable.  It should be
> better in 0.90 and easier to fix if bugs.
>
> St.Ack
>
>
> > Thanks
> >
> > On Thu, Jan 20, 2011 at 9:01 PM, Wayne <wav...@gmail.com> wrote:
> >
> >> I need to delete some tables and I am not sure the best way to do it.
> The
> >> shell does not work. The disable command says it runs ok but every time
> I
> >> run drop or truncate I get an exception that says the table is not
> >> disabled.  The UI shows it as disabled but truncate/drop still do not
> work.
> >> I have even tried to restart the cluster as sometimes that makes the
> disable
> >> "stick".
> >>
> >> What is the best way to delete a table manually? My assumption is that
> with
> >> 10k regions in 3 tables that I need to delete that the shell is not
> going to
> >> work. How can I do this without a completely fresh install of
> everything?
> >> How can the data/tables be removed manually without too much pain?
> >>
> >> Thanks.
> >>
> >
>

Reply via email to