[ https://issues.apache.org/jira/browse/KUDU-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adar Dembo reassigned KUDU-1358: -------------------------------- Assignee: (was: Adar Dembo) I never got around to doing this and am not actively working on it, so unassigning from myself. > Following a master leader election, create table may fail > --------------------------------------------------------- > > Key: KUDU-1358 > URL: https://issues.apache.org/jira/browse/KUDU-1358 > Project: Kudu > Issue Type: Sub-task > Components: master > Affects Versions: 0.7.0 > Reporter: Adar Dembo > Priority: Major > > In the current multi-master design and implementation, tservers only > heartbeat to the leader master. After a master leader election, there's a > short window of time in which the new leader master may not be aware of the > existence of some (or even all) of the tservers. Attempts to create a table > during this window may fail, as the tservers known to the new leader master > may be too few to satisfy the new table's replication factor. Whether the > window exists in the first place depends on whether the new leader master had > been leader before, and whether any of the tservers had sent heartbeats to it > during that time. > Some possible solutions include: > # Modifying the heartbeat protocol so that tservers heartbeat to _all_ > masters, leaders and followers alike. Doing this will ensure that the "soft > state" belonging to any master is always up-to-date at the cost of network > bandwidth lost to heartbeating. Additionally, changes may need to be made to > ensure that a follower master can't cause a tserver to take any actions. > # Never actually failing a create table request due to too few tservers, > instead allowing it to linger until such a time when more tservers exist. For > this to actually be practical we'd need to allow clients to "cancel" a > previously issued create table request. > Both approaches probably include additional ramifications; this problem needs > to be thought through carefully. -- This message was sent by Atlassian Jira (v8.3.4#803005)