What is the difference between .90 and .90_master_rewrite? Thanks.
On Fri, Jan 21, 2011 at 2:29 PM, Lars George <lars.geo...@gmail.com> wrote: > Hi Wayne, > > 0.90.0 is out. Get it while it's hot from the HBase home page. > > Lars > > On Jan 21, 2011, at 20:22, Wayne <wav...@gmail.com> wrote: > > > I enthusiastically created a ticket: > > https://issues.apache.org/jira/browse/HBASE-3463 > > > > This might be a dumb question I should already know the answer to...but > when > > is .90 coming out and what is its current state? Isn't there an RC > > out? We are on 0.89.20100924 and thought that was the latest for us to > work > > off of... > > > > As always thanks for the detailed responses. FYI: I ended up reformatting > as > > I could drop all tables and get phantom regions to go away after several > > restarts but the .META. table was still stuck reporting as 150MB with no > > tables and after issuing major_compact (which never seemed to have any > > affect)... > > > > Thanks. > > > > > > On Fri, Jan 21, 2011 at 1:47 PM, Stack <st...@duboce.net> wrote: > > > >> On Fri, Jan 21, 2011 at 4:51 AM, Wayne <wav...@gmail.com> wrote: > >>> After several hours I have figured out how to get the Disable command > to > >>> work and how to delete manually, but in the process there are 4 > problems > >> I > >>> encountered that I think are areas that could be improved (or my > >>> understanding improved). > >>> > >>> 1) The client timeout is used for the disable command which was my > >> problem. > >>> Does this totally make sense? Should a DML minded timeout be used for > DDL > >>> statements that we know can take a very long time normally with a large > >>> cluster? > >>> > >> > >> Sorry Wayne. I meant to respond yesterday to your original query. > >> > >> Enable/Disable has been redone in 0.90. Now there are added > >> enabling/disabling states that are maintained up in zk and in shell > >> there are commands is_enabled and is_disabled. We still have the same > >> (DML) timeout (sortof -- see below for more) but at least now if it > >> times out, you are not hosed. The disable or enable process is still > >> running and you can query its state. There is also notion of async > >> enable/disable though this latter facility is not exposed in shell, > >> only in the HBaseAdmin API. > >> > >> > >>> 2) If the disable command fails the first time it does not "roll back". > >> The > >>> ONLY way to proceed is to enable and then try to disable again. The > first > >>> disable attempt is all that seems to work. Subsequent disable > statements > >>> usually work without errors but never seem to "work". The entire table > >>> should be disabled after issuing this command or the entire table > should > >>> still be enabled. I was caught in this half disabled or mostly disabled > >>> which was very frustating. > >>> > >> > >> Sorry about that. Should be better in 0.90.0. > >> > >> Things should run a bit faster in 0.90.0 too because disable used to > >> include an update of .META. per region plus a close of all regions > >> that make up the table. In 0.90.0 there is no longer the .META. > >> update and close is more prompt now; in the past close would wait on > >> any running compactions to complete before proceeding. In 0.90.0 > >> we'll no interrupt the running compaction so close happens the sooner. > >> > >> There is room for a bunch more improvement. For example, deleting a > >> table, there should be short-circuit that punts on flush of in-memory > >> state and clean-close of open regions. > >> > >>> 3) The biggest issue of all is why certain regions do not report back > to > >> the > >>> disable command. What are the various states of a region that could > cause > >>> this? Compaction I know is one, what else could cause the disable > command > >> to > >>> take too long? Shouldn't a disable force itself through and wait long > >> enough > >>> to be able to disable every region? Again a long wait time or a more > >>> forceful operation would help. > >>> > >> > >> It wasn't that smart in 0.20/0.89. Its still pretty dumb but better in > >> 0.90.0. > >> > >> Master process runs the enable/disable process in both old and new > >> HBase. In 0.20/0.89, it was a sync process w/ master waiting on > >> regions to flip to 'offline' after successful close. The state of > >> disabledness was when all regions in table had 'offline' state. Any > >> hiccup, a problem closing or a failure to update .META. w/ offline per > >> region would bork the disabling process. It was super fragile. We > >> tried to talk it up as so. > >> > >> In 0.90, client queues in master an executor that flips table to > >> disabling in zk and then in parallel sends out unassigns of all table > >> regions. The executor then hangs around with a more DDL-like timeout > >> of hbase.bulk.assignment.waiton.empty.rit (10minutes by default). > >> Meantime clients can check state of the disable. After all unassigns > >> complete, the table is flipped to disabled. > >> > >> > >>> 4) Through all of the attempts to disable I saw regions coming and > going > >> and > >>> nothing was consistent. The UI showed the table as disabled and listed > 1 > >>> region in the table (there were 1000s). The node view listed several > >> other > >>> regions but not the same one as the table view. It was a very strange > >>> situation. The UI to browse the tables and regions is great but it > would > >> be > >>> even better if it gave a 100% view of regions and their current states. > A > >>> summary view of region counts per table based on state or status would > be > >>> fantastic. > >> > >> Please file a JIRA. Sounds like good idea. We could hoist stuff up > >> out of hbck tool up into UI. > >> > >> > >>> There is a compaction count, but what about in split, read/rite > >>> lock, disabled, etc. What is the precise list of regions states that > >> could > >>> occur and show a summary count per state as well as detailed state for > >> each > >>> specific region in the list. Fundamentally this is the health monitor > of > >> the > >>> system and as a dba I really need to know the 100% count of regions and > >>> where they are all at in terms of availability. Are they disabled, > >> blocked > >>> for writes, blocked for reads, in compaction, etc. etc. If there are > >> various > >>> states that cause disabling to be blocked it can be reported here so > that > >> I > >>> at least know when a disable command can be executed successfully (and > >> this > >>> should be documented). > >>> > >> > >> > >> Please file a JIRA. This is great stuff. > >> > >> Sorry for pain caused messing w/ broke enable/disable. It should be > >> better in 0.90 and easier to fix if bugs. > >> > >> St.Ack > >> > >> > >>> Thanks > >>> > >>> On Thu, Jan 20, 2011 at 9:01 PM, Wayne <wav...@gmail.com> wrote: > >>> > >>>> I need to delete some tables and I am not sure the best way to do it. > >> The > >>>> shell does not work. The disable command says it runs ok but every > time > >> I > >>>> run drop or truncate I get an exception that says the table is not > >>>> disabled. The UI shows it as disabled but truncate/drop still do not > >> work. > >>>> I have even tried to restart the cluster as sometimes that makes the > >> disable > >>>> "stick". > >>>> > >>>> What is the best way to delete a table manually? My assumption is that > >> with > >>>> 10k regions in 3 tables that I need to delete that the shell is not > >> going to > >>>> work. How can I do this without a completely fresh install of > >> everything? > >>>> How can the data/tables be removed manually without too much pain? > >>>> > >>>> Thanks. > >>>> > >>> > >> >