Thanks Duo for your offer to coordinate on writing "Part 3" of this series,
sounds great!
Although I see TRSP#assign being used by SCP directly while assigning the
regions, I am yet to take a detailed look into HBASE-20881
<https://issues.apache.org/jira/browse/HBASE-20881> and the relevant work.
Let me reach out to you over Slack and we can take it from there.

On Sun, Sep 12, 2021 at 7:02 PM 张铎(Duo Zhang) <palomino...@gmail.com> wrote:

> Thank you Viraj and Andrew, the blog posts are outstanding!
>
> And I think we'd better have a part 3, about the ServerCrashProcedure(SCP)
> :)
>
> In 2.0 and 2.1, we use MoveRegionProcedure, AssignRegionProcedure and
> UnassignRegionProcedure, and one of the reasons why we removed them all and
> introduced a single TRSP to do assign/unassign/move/reopen, is because of
> SCP.
>
> If a region server crashed, obviously, we can not assign regions to it any
> more, so we should have a way to stop the procedure which are still trying
> to assign regions to the dead server. And even for unassigning a region, we
> still need to make it online first and then unassign it. For example, when
> disabling a table, we must make sure that all the data in memstore have
> been flushed to storage, so we will need make it online, and then do a
> clean close.
> In 2.0 and 2.1, we had 3 procedures for region assignment, and there were
> lots of corner cases when we want to interrupt them from SCP, which made
> the code really hard to understand and buggy. So finally, we introduced a
> TRSP to replace them all. So SCP only needs to interrupt one type of
> procedure.
>
> This is the story :)
>
> I could help if you guys want to write the part 3 about SCP :)
>
> Thanks.
>
> Viraj Jasani <vjas...@apache.org> 于2021年9月8日周三 上午2:27写道:
>
> > As some of the HBase users are still running HBase 1.x versions in their
> > production environment, and branch-1 is trending toward EOL, now is
> really
> > the right time to evaluate as well as understand the features and core
> > design changes provided by HBase 2.x versions.
> >
> > As the majority of us are already aware, one of the key features with
> > significant architectural changes provided by HBase 2 is
> > AssignmentManagerV2 (AMv2).
> > However, we don't seem to have one place explaining 1) *the evolution
> > of AM* and
> > 2) how it manages region assignments with better scalability, reliability
> > and fault-tolerance.
> > Keeping this in mind, Andrew and I have published a series of two-part
> blog
> > posts explaining this evolution. Part 1 provides a) some basic
> introduction
> > to HBase concepts, and b) AM and it's shortcomings from previous versions
> > that AMv2 is trying to resolve. Part 2 provides detailed info about Pv2
> and
> > how AMv2 leverages it, and also state diagrams explaining some of the
> > complex region assignment workflows. The intention of state diagrams is
> for
> > dev/users to be able to a) understand region assignment workflows
> in-depth,
> > b) easier code walk-through and c) debug and root cause issues with
> > better knowledge.
> >
> > Part 1:
> >
> >
> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-1-c43b1becc522
> > Part 2:
> >
> >
> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-2-9568fb3790b
> >
>

Reply via email to