Hi Wayne, 0.90.0 is out. Get it while it's hot from the HBase home page.
Lars On Jan 21, 2011, at 20:22, Wayne <wav...@gmail.com> wrote: > I enthusiastically created a ticket: > https://issues.apache.org/jira/browse/HBASE-3463 > > This might be a dumb question I should already know the answer to...but when > is .90 coming out and what is its current state? Isn't there an RC > out? We are on 0.89.20100924 and thought that was the latest for us to work > off of... > > As always thanks for the detailed responses. FYI: I ended up reformatting as > I could drop all tables and get phantom regions to go away after several > restarts but the .META. table was still stuck reporting as 150MB with no > tables and after issuing major_compact (which never seemed to have any > affect)... > > Thanks. > > > On Fri, Jan 21, 2011 at 1:47 PM, Stack <st...@duboce.net> wrote: > >> On Fri, Jan 21, 2011 at 4:51 AM, Wayne <wav...@gmail.com> wrote: >>> After several hours I have figured out how to get the Disable command to >>> work and how to delete manually, but in the process there are 4 problems >> I >>> encountered that I think are areas that could be improved (or my >>> understanding improved). >>> >>> 1) The client timeout is used for the disable command which was my >> problem. >>> Does this totally make sense? Should a DML minded timeout be used for DDL >>> statements that we know can take a very long time normally with a large >>> cluster? >>> >> >> Sorry Wayne. I meant to respond yesterday to your original query. >> >> Enable/Disable has been redone in 0.90. Now there are added >> enabling/disabling states that are maintained up in zk and in shell >> there are commands is_enabled and is_disabled. We still have the same >> (DML) timeout (sortof -- see below for more) but at least now if it >> times out, you are not hosed. The disable or enable process is still >> running and you can query its state. There is also notion of async >> enable/disable though this latter facility is not exposed in shell, >> only in the HBaseAdmin API. >> >> >>> 2) If the disable command fails the first time it does not "roll back". >> The >>> ONLY way to proceed is to enable and then try to disable again. The first >>> disable attempt is all that seems to work. Subsequent disable statements >>> usually work without errors but never seem to "work". The entire table >>> should be disabled after issuing this command or the entire table should >>> still be enabled. I was caught in this half disabled or mostly disabled >>> which was very frustating. >>> >> >> Sorry about that. Should be better in 0.90.0. >> >> Things should run a bit faster in 0.90.0 too because disable used to >> include an update of .META. per region plus a close of all regions >> that make up the table. In 0.90.0 there is no longer the .META. >> update and close is more prompt now; in the past close would wait on >> any running compactions to complete before proceeding. In 0.90.0 >> we'll no interrupt the running compaction so close happens the sooner. >> >> There is room for a bunch more improvement. For example, deleting a >> table, there should be short-circuit that punts on flush of in-memory >> state and clean-close of open regions. >> >>> 3) The biggest issue of all is why certain regions do not report back to >> the >>> disable command. What are the various states of a region that could cause >>> this? Compaction I know is one, what else could cause the disable command >> to >>> take too long? Shouldn't a disable force itself through and wait long >> enough >>> to be able to disable every region? Again a long wait time or a more >>> forceful operation would help. >>> >> >> It wasn't that smart in 0.20/0.89. Its still pretty dumb but better in >> 0.90.0. >> >> Master process runs the enable/disable process in both old and new >> HBase. In 0.20/0.89, it was a sync process w/ master waiting on >> regions to flip to 'offline' after successful close. The state of >> disabledness was when all regions in table had 'offline' state. Any >> hiccup, a problem closing or a failure to update .META. w/ offline per >> region would bork the disabling process. It was super fragile. We >> tried to talk it up as so. >> >> In 0.90, client queues in master an executor that flips table to >> disabling in zk and then in parallel sends out unassigns of all table >> regions. The executor then hangs around with a more DDL-like timeout >> of hbase.bulk.assignment.waiton.empty.rit (10minutes by default). >> Meantime clients can check state of the disable. After all unassigns >> complete, the table is flipped to disabled. >> >> >>> 4) Through all of the attempts to disable I saw regions coming and going >> and >>> nothing was consistent. The UI showed the table as disabled and listed 1 >>> region in the table (there were 1000s). The node view listed several >> other >>> regions but not the same one as the table view. It was a very strange >>> situation. The UI to browse the tables and regions is great but it would >> be >>> even better if it gave a 100% view of regions and their current states. A >>> summary view of region counts per table based on state or status would be >>> fantastic. >> >> Please file a JIRA. Sounds like good idea. We could hoist stuff up >> out of hbck tool up into UI. >> >> >>> There is a compaction count, but what about in split, read/rite >>> lock, disabled, etc. What is the precise list of regions states that >> could >>> occur and show a summary count per state as well as detailed state for >> each >>> specific region in the list. Fundamentally this is the health monitor of >> the >>> system and as a dba I really need to know the 100% count of regions and >>> where they are all at in terms of availability. Are they disabled, >> blocked >>> for writes, blocked for reads, in compaction, etc. etc. If there are >> various >>> states that cause disabling to be blocked it can be reported here so that >> I >>> at least know when a disable command can be executed successfully (and >> this >>> should be documented). >>> >> >> >> Please file a JIRA. This is great stuff. >> >> Sorry for pain caused messing w/ broke enable/disable. It should be >> better in 0.90 and easier to fix if bugs. >> >> St.Ack >> >> >>> Thanks >>> >>> On Thu, Jan 20, 2011 at 9:01 PM, Wayne <wav...@gmail.com> wrote: >>> >>>> I need to delete some tables and I am not sure the best way to do it. >> The >>>> shell does not work. The disable command says it runs ok but every time >> I >>>> run drop or truncate I get an exception that says the table is not >>>> disabled. The UI shows it as disabled but truncate/drop still do not >> work. >>>> I have even tried to restart the cluster as sometimes that makes the >> disable >>>> "stick". >>>> >>>> What is the best way to delete a table manually? My assumption is that >> with >>>> 10k regions in 3 tables that I need to delete that the shell is not >> going to >>>> work. How can I do this without a completely fresh install of >> everything? >>>> How can the data/tables be removed manually without too much pain? >>>> >>>> Thanks. >>>> >>> >>