[ https://issues.apache.org/jira/browse/YARN-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062212#comment-14062212 ]
Jason Lowe commented on YARN-2045: ---------------------------------- bq. I agree the concept is not quite the same but I tend to handle them both together as either of change (protobuf schema or layout schema) will bring difficulty/risky for NMStateStoreService to load old version of data. I think lumping them together and handling them in the implementation-specific code is fine, but if the implementation is handling all the details then why is it exposed in the interface? I think the most telling point is that in the proposed patch no common code actually uses the interfaces that were added. Each implementation does its own version setting, its own compatibility check, and I assume its own marshaling in the future if necessary. The interfaces aren't called by common code. Maybe I'm not seeing the future use case of these methods? I guess it could be useful for common code to do logging/reporting of the persisted/current versions or maybe to do a very simplistic incompatibility check (e.g.: assume different major numbers means incompatible), although arguably the implementation could simply log these numbers as it initializes and is already doing an implementation-specific compatibility check. However I'm particularly doubtful of the storeVersion method as it seems like the only way to safely convert versions in the general sense is with implementation-specific code. Using the conversion pseudo-code above as an example, if we crash halfway through the conversion of a series of objects then we have a mix of old and new data on the next restart but the stored version number is still old (or vice-versa if we store the new version first then convert). In an implementation-specific approach it may be possible to make the conversion atomic, e.g.: using a batch write for the entire conversion in leveldb. Therefore it makes more sense to me that an implementation should be responsible for deciding when and how to update the persisted schema version. I would expect implementations to do this sort of conversion during initialization and potentially the old persisted version would never be seen since it would already be converted. Do you have an example where using the storeVersion method in the interface via implementation-independent code would be more appropriate and therefore the storeVersion method in the interface is necessary? To summarize, I can see exposing the ability to get the persisted and current state store versions in the interface for logging, etc. However I don't see how implementation-independent code can properly update the version via the interface. We're lumping both interface and implementation-specific schema changes in the same version number, and it isn't possible to do an update of multiple store objects atomically via the current interface. bq. Are you suggesting NMDBSchemaVersion to play as PBImpl directly to include raw protobuf or something else? Sort of a subset of what the PBImpl is doing. I was thinking of having NMDBSchemaVersion wrap the protobuf but in a read-only way (i.e.: no set methods, no builder stuff). If one wants to change the version number, create a new protobuf. PBImpls tend to get into trouble because they can be written, and it's simpler to treat the protobufs as immutable as they were intended. Another approach would be to simply have some static helper util methods that take two protobufs to do the compatible checks, etc. Although I don't think we can really implement a useful isCompatibleTo check in implementaion-independent code since the version numbers encodes implementation-specific schema information. Anyway I didn't mean to drag out this change for too long. I'm wondering about these interfaces since I'm a strong believer that interfaces should be minimal and necessary, and I'm having difficulty seeing how these interfaces are really going to be used. However I'm probably in the minority on these methods. If people feel strongly that these interfaces are necessary and useful then go ahead and add them. It seems to me that these interfaces will either never be called or only called for trivial reasons (e.g.: logging). However I don't think having them is going to break anything or be an unreasonable burden on an implementation, rather just extra baggage that state store implementations have to expose. As for the PBImpl, it's mostly a nit. If you really would rather keep it in I guess that's fine. We should be able to remove it later if we realize we don't have a use for it. The main change I think has to be made is the leveldb schema check should handle the original method for storing the schema. Two ways to handle that are either explicitly check for the "1.0" string before trying to parse the value as a protobuf or store the new version type under a different key and check for the old schema key if the new schema key is missing. > Data persisted in NM should be versioned > ---------------------------------------- > > Key: YARN-2045 > URL: https://issues.apache.org/jira/browse/YARN-2045 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Affects Versions: 2.4.1 > Reporter: Junping Du > Assignee: Junping Du > Attachments: YARN-2045-v2.patch, YARN-2045.patch > > > As a split task from YARN-667, we want to add version info to NM related > data, include: > - NodeManager local LevelDB state > - NodeManager directory structure -- This message was sent by Atlassian JIRA (v6.2#6252)