Circling back here and adding user@phoenix. I put together one script to dump region info from the shell and find the empty ones, another to merge a given region into a neighbor. We've run them without incident, looks like it all works fine. One thing we did notice is that the AM leaves the old "retired" regions around in its counts -- the master status page shows a large number of "Other Regions". This was alarming at first, but we verified it's just an artifact in the AM and in fact these regions are not on HDFS or in meta. Bouncing master resolved it.
No one has volunteered any alternative schema designs, so as best we know, this will happen to anyone who has timestamp in their rowkey (ie, anyone using Phoenix's "Row timestamp" feature [0]) and is also using the TTL feature. Are folks interested in adding these scripts to our distribution and our book? -n [0]: https://phoenix.apache.org/rowtimestamp.html On Mon, Apr 4, 2016 at 8:34 AM, Nick Dimiduk <ndimi...@gmail.com> wrote: > > Crazy idea, but you might be able to take stripped down version of > region > > normalizer code and make a Tool to run? Requesting split or merge is done > > through the client API, and the only weighing information you need is > > whether region empty or not, that you could find out too? > > Yeah, that's the direction I'm headed. > > > A bit off topic, but I think unfortunately region normalizer now ignores > > empty regions to avoid undoing pre-split on the table. > > Unfortunate indeed. Maybe we should be keeping around the initial splits > list as a metadata attribute on the table? > > > With a right row-key design you will never have empty regions due to TTL. > > I'd love to hear your thoughts on this design, Vlad. Maybe you'd like to > write up a post for the blog? Meanwhile, I'm sure of a couple of us on here > on the list would appreciate your Cliff's Notes version. I can take this > into account for my v2 schema design. > > > So Nick, merge on 1.1 is not recommended??? Was working very well on > > previous versions. Is ProcV2 really impact it that bad?? > > How to answer here carefully... I have no reason to believe merge is not > working on 1.1. I've been on the wrong end of enough "regions stuck in > transition" support tickets that I'm not keen to put undue stress on my > master. ProcV2 insures against many scenarios that cause master trauma, > hence my interest in the implementation details and my preference for > cluster administration tasks that use it as their source of authority. > > Thanks for the thoughts folks. > -n > > On Fri, Apr 1, 2016 at 10:52 AM, Jean-Marc Spaggiari < > jean-m...@spaggiari.org> wrote: > >> ;) That was not the question ;) >> >> So Nick, merge on 1.1 is not recommended??? Was working very well on >> previous versions. Is ProcV2 really impact it that bad?? >> >> JMS >> >> 2016-04-01 13:49 GMT-04:00 Vladimir Rodionov <vladrodio...@gmail.com>: >> >> > >> This is something >> > >> which makes it far less useful for time-series databases with short >> TTL >> > on >> > >> the tables. >> > >> > With a right row-key design you will never have empty regions due to >> TTL. >> > >> > -Vlad >> > >> > On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov <olorinb...@gmail.com >> > >> > wrote: >> > >> > > Crazy idea, but you might be able to take stripped down version of >> region >> > > normalizer code and make a Tool to run? Requesting split or merge is >> done >> > > through the client API, and the only weighing information you need is >> > > whether region empty or not, that you could find out too? >> > > >> > > >> > > "Short of upgrading to 1.2 for the region normalizer," >> > > >> > > A bit off topic, but I think unfortunately region normalizer now >> ignores >> > > empty regions to avoid undoing pre-split on the table. This is >> something >> > > which makes it far less useful for time-series databases with short >> TTL >> > on >> > > the tables. We'll need to address that. >> > > >> > > -Mikhail >> > > >> > > On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk <ndimi...@gmail.com> >> > wrote: >> > > >> > > > Hi folks, >> > > > >> > > > I have a table with TTL enabled. It's been receiving data for a >> while >> > > > beyond the TTL and I now have a number of empty regions. I'd like to >> > drop >> > > > those empty regions to free up heap space on the region servers and >> > > reduce >> > > > master load. I'm running a 1.1 derivative. >> > > > >> > > > The only threads I found on this topic are from circa 0.92 >> timeframe. >> > > > >> > > > Short of upgrading to 1.2 for the region normalizer, what's the >> > > recommended >> > > > method of cleaning up this cruft? Should I be merging empty regions >> > into >> > > > their neighbor's? Looks like region merge hasn't been migrated to >> > ProcV2 >> > > > yet so would be wise to reduce online table activity, or at least >> aim >> > > for a >> > > > "quiet period"? Is there a documented process for off-lining and >> > > deleting a >> > > > region by name? I don't see anything in the book about it. >> > > > >> > > > I experimented with online merge on pseudodist, looks like it's >> working >> > > > fine for the most basic case. I'll probably pursue this unless >> someone >> > > has >> > > > some other ideas. >> > > > >> > > > Thanks, >> > > > Nick >> > > > >> > > >> > > >> > > >> > > -- >> > > Thanks, >> > > Michael Antonov >> > > >> > >> > >