Hi Yongjun, Good suggestion. This is essentially what HDFS-13873 is implementing to mitigate the concern.
Thanks, --Konstantin On Wed, Dec 12, 2018 at 10:35 PM Yongjun Zhang <yzh...@cloudera.com> wrote: > Hi Konstantin, > > Thanks for addressing my other question about failover. > > Some thought to share about the suggestion Daryn made. Seems we could try > this: let ObserverNode throws an RetriableException back to client saying > it has not reached the transaction ID to serve the client yet, maybe even > include the transaction ID gap information in the exception, then when the > client received the RetriableException, it can decide whether the continue > to send the request to the observer node again, or to the active NN when > the gap is too big. > > Though saving another RPC would help the performance with the current > implementation, I expect the above mentioned exception only happens > infrequently, so the performance won't be too bad, plus the client has a > chance to try ANN when knowing that the observer is too behind at extreme > case. > > I wonder how different the performance is between these two approaches in > cluster with real workload. > > Comments? > > --Yongjun > > On Fri, Dec 7, 2018 at 4:10 PM Konstantin Shvachko <shv.had...@gmail.com> > wrote: > >> Hi Daryn, >> >> Wanted to backup Chen's earlier response to your concerns about rotating >> calls in the call queue. >> Our design >> 1. targets directly the livelock problem by rejecting calls on the >> Observer >> that are not likely to be responded in timely matter: HDFS-13873. >> 2. The call queue rotation is only done on Observers, and never on the >> active NN, so it stays free of attacks like you suggest. >> >> If this is a satisfactory mitigation for the problem could you please >> reconsider your -1, so that people could continue voting on this thread. >> >> Thanks, >> --Konst >> >> On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <da...@oath.com> wrote: >> >> > -1 pending additional info. After a cursory scan, I have serious >> concerns >> > regarding the design. This seems like a feature that should have been >> > purely implemented in hdfs w/o touching the common IPC layer. >> > >> > The biggest issue in the alignment context. It's purpose appears to be >> > for allowing handlers to reinsert calls back into the call queue. >> That's >> > completely unacceptable. A buggy or malicious client can easily cause >> > livelock in the IPC layer with handlers only looping on calls that never >> > satisfy the condition. Why is this not implemented via >> RetriableExceptions? >> > >> > On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang >> <yzh...@cloudera.com.invalid> >> > wrote: >> > >> >> Great work guys. >> >> >> >> Wonder if we can elaborate what's impact of not having #2 fixed, and >> why >> >> #2 >> >> is not needed for the feature to complete? >> >> 2. Need to fix automatic failover with ZKFC. Currently it does not >> doesn't >> >> know about ObserverNodes trying to convert them to SBNs. >> >> >> >> Thanks. >> >> --Yongjun >> >> >> >> >> >> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko < >> shv.had...@gmail.com> >> >> wrote: >> >> >> >> > Hi Hadoop developers, >> >> > >> >> > I would like to propose to merge to trunk the feature branch >> HDFS-12943 >> >> for >> >> > Consistent Reads from Standby Node. The feature is intended to scale >> >> read >> >> > RPC workloads. On large clusters reads comprise 95% of all RPCs to >> the >> >> > NameNode. We should be able to accommodate higher overall RPC >> workloads >> >> (up >> >> > to 4x by some estimates) by adding multiple ObserverNodes. >> >> > >> >> > The main functionality has been implemented see sub-tasks of >> HDFS-12943. >> >> > We followed up with the test plan. Testing was done on two >> independent >> >> > clusters (see HDFS-14058 and HDFS-14059) with security enabled. >> >> > We ran standard HDFS commands, MR jobs, admin commands including >> manual >> >> > failover. >> >> > We know of one cluster running this feature in production. >> >> > >> >> > There are a few outstanding issues: >> >> > 1. Need to provide proper documentation - a user guide for the new >> >> feature >> >> > 2. Need to fix automatic failover with ZKFC. Currently it does not >> >> doesn't >> >> > know about ObserverNodes trying to convert them to SBNs. >> >> > 3. Scale testing and performance fine-tuning >> >> > 4. As testing progresses, we continue fixing non-critical bugs like >> >> > HDFS-14116. >> >> > >> >> > I attached a unified patch to the umbrella jira for the review and >> >> Jenkins >> >> > build. >> >> > Please vote on this thread. The vote will run for 7 days until Wed >> Dec >> >> 12. >> >> > >> >> > Thanks, >> >> > --Konstantin >> >> > >> >> >> > >> > >> > -- >> > >> > Daryn >> > >> >