Haoze Wu created HBASE-27520: -------------------------------- Summary: The potential delay of HDFS RPC in HRegion may cause data inconsistency Key: HBASE-27520 URL: https://issues.apache.org/jira/browse/HBASE-27520 Project: HBase Issue Type: Bug Affects Versions: 2.4.2 Reporter: Haoze Wu
This is a follow-up for HBASE-26256 about the data inconsistency issue. The data inconsistency issue is that when the HBase shell is running the `create` command, the metadata is not finalized or committed until the shell command returns, but other clients can see the table “created” when they use the “list” command. We recently did more analysis and found the root cause. Here is the workflow. When a table is created, the CreateTableProcedure is created at HMaster#createTable at line 2011: {code:java} return MasterProcedureUtil .submitProcedure(new MasterProcedureUtil.NonceProcedureRunnable(this, nonceGroup, nonce) { @Override protected void run() throws IOException { getMaster().getMasterCoprocessorHost().preCreateTable(desc, newRegions); LOG.info(getClientIdAuditPrefix() + " create " + desc); // TODO: We can handle/merge duplicate requests, and differentiate the case of // TableExistsException by saying if the schema is the same or not. // // We need to wait for the procedure to potentially fail due to "prepare" sanity // checks. This will block only the beginning of the procedure. See HBASE-19953. ProcedurePrepareLatch latch = ProcedurePrepareLatch.createBlockingLatch(); submitProcedure( new CreateTableProcedure(procedureExecutor.getEnvironment(), desc, newRegions, latch)); // line 2011 latch.await(); getMaster().getMasterCoprocessorHost().postCreateTable(desc, newRegions); } @Override protected String getDescription() { return "CreateTableProcedure"; } });{code} The its call stack is: {code:java} org.apache.hadoop.hbase.master.HMaster,createTable,1995 org.apache.hadoop.hbase.master.MasterRpcServices,createTable,705 org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2,callBlockingMethod,108919 org.apache.hadoop.hbase.ipc.RpcServer,call,395 org.apache.hadoop.hbase.ipc.CallRunner,run,133 org.apache.hadoop.hbase.ipc.RpcExecutor$Handler,run,338 org.apache.hadoop.hbase.ipc.RpcExecutor$Handler,run,318 {code} `CreateTableProcedure` is submitted at line 2011. It will eventually call CreateTableProcedure#executeFromState in HMaster. CreateTableProcedure#executeFromState runs a state machine. It exercises 3 states, one after another (via the loop in ProcedureExecutor#executeProcedure): # `CREATE_TABLE_ADD_TO_META` state. It will add table info to the hbase::meta. # `CREATE_TABLE_ASSIGN_REGIONS` state. It will create a subprocedure, which makes an AssignmentManager in master connect to regionservers to distribute the region create command. # `CREATE_TABLE_UPDATE_DESC_CACHE` state. It finalizes the region creation. Code snippet (CreateTableProcedure#executeFromState, line 105 ~ 120): {code:java} case CREATE_TABLE_ADD_TO_META: newRegions = addTableToMeta(env, tableDescriptor, newRegions); setNextState(CreateTableState.CREATE_TABLE_ASSIGN_REGIONS); break; case CREATE_TABLE_ASSIGN_REGIONS: setEnablingState(env, getTableName()); addChildProcedure(env.getAssignmentManager() .createRoundRobinAssignProcedures(newRegions)); setNextState(CreateTableState.CREATE_TABLE_UPDATE_DESC_CACHE); break; case CREATE_TABLE_UPDATE_DESC_CACHE: // XXX: this stage should be named as set table enabled, as now we will cache the // descriptor after writing fs layout. setEnabledState(env, getTableName()); setNextState(CreateTableState.CREATE_TABLE_POST_OPERATION); break;{code} When the issue described in HBASE-26256 appears, it is in the `CREATE_TABLE_ASSIGN_REGIONS` state. The subprocedure calling regionserver is stuck for a while, so the state machine is not able to proceed to the `CREATE_TABLE_UPDATE_DESC_CACHE` state temporarily. However, at this moment, when other clients run the `list` command, they are essentially only checking the meta variable field in this HMaster and print this table, although the state machine here is still stuck at the `CREATE_TABLE_ASSIGN_REGIONS` and this table is actually not yet fully created. Therefore, the inconsistency happens. To fix this issue, we need to slightly change the state machine or add some auxiliary flags for the tables to mark whether the table is still under the procedure of being created. We think the latter is better because changing the state machine may involve lots of modification. -- This message was sent by Atlassian Jira (v8.20.10#820010)