> > * For YARN-6050, there's a bit here: > https://developers.google.com/protocol-buffers/docs/proto that says > "optional is compatible with repeated", so I think we should be OK there. > - Optional is compatible with repeatable over the wire such that > protobuf won't blow up, but does that actually mean that it's compatible in > this case? If it's expecting an optional and gets a repeated, it's going to > drop everything except for the last value. I don't know enough about > YARN-6050 to say if this will be ok or not.
It's been a while since I looked into this, but I think it should be okay. If an older client (using optional) sends the message to a newer server (using repeated), then there will never be more than one value for the field. The server puts these into a list, so the list would simply have a single value in it. The server's logic should be able to handle a single valued list here because (a) IIRC we wanted to make sure compatibility wasn't a problem (Cloudera supported rolling upgrades between CDH 5.x so this was important) and (b) sending a single resource request, even in a newer client, is a still a valid thing to do. If a newer client (using repeated) sends the message to an older server (using optional), I'm not sure what will happen. My guess is that it will drop the extra values (though I wonder if it will keep the first or last value...). In any case, I believe most clients will only send the one value - in order for a client to send multiple values, you'd have to specify some additional MR configs (see MAPREDUCE-6871). IIRC, there's also a SPARK JIRA similar to MAPREDUCE-6871, but I can't find it right now. - Robert On Tue, Sep 24, 2019 at 9:49 PM Jonathan Hung <jyhung2...@gmail.com> wrote: > - I've created YARN-9855 and uploaded patches to fix YARN-6616 in > branch-2.8 and branch-2.7. > - For YARN-6050, not sure either. Robert/Wangda, can you comment on > YARN-6050 compatibility? > - For YARN-7813, not sure why moving from 2.8.4/5 -> 2.8.6 would be > incompatible with this strategy? It should be OK to remove/add optional > fields (removing the field with id 12, and adding the field with id 13). > The difficulties I see here are, we would have to leave id 12 blank in > 2.8.6 (so we cannot have YARN-6164 in branch-2.8), and users on 2.8.4/5 > would have to move to 2.8.6 before moving to 2.9+. But rolling upgrade > would still work IIUC. > > Jonathan Hung > > > On Tue, Sep 24, 2019 at 2:52 PM Eric Badger <ebad...@verizonmedia.com> > wrote: > >> * For YARN-6616, for branch-2.8 and below, it was only committed to >> 2.7.8/2.8.6 which have not been released (as I understand). Perhaps we can >> revert YARN-6616 from branch-2.7 and branch-2.8. >> - This seems reasonable. Since we haven't released anything, it should >> be no issue to change the 2.7/2.8 protobuf field to have the same value as >> 2.9+ >> >> * For YARN-6050, there's a bit here: >> https://developers.google.com/protocol-buffers/docs/proto that says >> "optional is compatible with repeated", so I think we should be OK there. >> - Optional is compatible with repeatable over the wire such that >> protobuf won't blow up, but does that actually mean that it's compatible in >> this case? If it's expecting an optional and gets a repeated, it's going to >> drop everything except for the last value. I don't know enough about >> YARN-6050 to say if this will be ok or not. >> >> * For YARN-7813, it's in 2.8.4 so it seems upgrading from 2.8.4 or >> 2.8.5 to a 2.9+ version will be an issue. One option could be to move the >> intraQueuePreemptionDisabled field from id 12 to id 13 in branch-2.8, then >> users would upgrade from 2.8.4/2.8.5 to 2.8.6 (someone would have to >> release this), then upgrade from 2.8.6 to 2.9+. >> - I'm ok with this, but it should be noted that the upgrade from >> 2.8.4/2.8.5 to 2.8.6 (or 2.9+) would not be compatible for a rolling >> upgrade. So this would cause some pain to anybody with clusters on those >> versions. >> >> Eric >> >> On Tue, Sep 24, 2019 at 2:42 PM Jonathan Hung <jyhung2...@gmail.com> >> wrote: >> >>> Sorry, let me edit my first point. We can just create addendums for >>> YARN-6616 in branch-2.7 and branch-2.8 to edit the submitTime field to the >>> correct id 28. We don’t need to revert YARN-6616 from these branches >>> completely. >>> >>> Jonathan >>> >>> ________________________________ >>> From: Jonathan Hung <jyhung2...@gmail.com> >>> Sent: Tuesday, September 24, 2019 11:38 AM >>> To: Eric Badger >>> Cc: Hadoop Common; yarn-dev; mapreduce-dev; Hdfs-dev >>> Subject: Re: Incompatible changes between branch-2.8 and branch-2.9 >>> >>> Hi Eric, thanks for the investigation. >>> >>> * For YARN-6616, for branch-2.8 and below, it was only committed to >>> 2.7.8/2.8.6 which have not been released (as I understand). Perhaps we can >>> revert YARN-6616 from branch-2.7 and branch-2.8. >>> * For YARN-6050, there's a bit here: >>> https://developers.google.com/protocol-buffers/docs/proto that says >>> "optional is compatible with repeated", so I think we should be OK there. >>> * For YARN-7813, it's in 2.8.4 so it seems upgrading from 2.8.4 or >>> 2.8.5 to a 2.9+ version will be an issue. One option could be to move the >>> intraQueuePreemptionDisabled field from id 12 to id 13 in branch-2.8, then >>> users would upgrade from 2.8.4/2.8.5 to 2.8.6 (someone would have to >>> release this), then upgrade from 2.8.6 to 2.9+. >>> >>> Jonathan Hung >>> >>> >>> On Tue, Sep 24, 2019 at 9:23 AM Eric Badger >>> <ebad...@verizonmedia.com.invalid> >>> wrote: >>> We (Verizon Media) are currently moving towards upgrading our clusters >>> from >>> our internal fork of branch-2.8 to an internal fork of branch-2. During >>> this process, we have found multiple incompatible changes in protobufs >>> between branch-2.8 and branch-2. These incompatibilities were all >>> introduced between branch-2.8 and branch-2.9. I did a git diff over all >>> .proto files across the branch-2.8 and branch-2.9 and found 3 instances >>> of >>> incompatibilities from 3 separate commits. All of the incompatibilities >>> are >>> in yarn_protos.proto >>> >>> >>> I would like to discuss how to fix these incompatible changes. Otherwise, >>> rolling upgrades will not be supported between branch-2.8 (or below) and >>> branch-2.9 (or beyond). We could revert the incompatible changes, but >>> then >>> the new releases would be incompatible with the releases that have these >>> incompatible changes. If we do nothing, then rolling upgrades won't work >>> between 2.8- and 2.9+. >>> >>> >>> Thanks, >>> >>> >>> Eric >>> >>> >>> ------------------------------------------------------------------- >>> >>> >>> git diff branch-2.8..branch-2.9 $(find . -name '*\.proto') >>> >>> >>> https://issues.apache.org/jira/browse/YARN-6616 >>> >>> - Trunk patch (applied through branch-2.9) differs from branch-2.8 >>> patch >>> >>> @@ -211,7 +245,20 @@ message ApplicationReportProto { >>> >>> optional PriorityProto priority = 23; >>> >>> optional string appNodeLabelExpression = 24; >>> >>> optional string amNodeLabelExpression = 25; >>> >>> - optional int64 submitTime = 26; >>> >>> + repeated AppTimeoutsMapProto appTimeouts = 26; >>> >>> + optional int64 launchTime = 27; >>> >>> + optional int64 submitTime = 28; >>> >>> >>> https://issues.apache.org/jira/browse/YARN-6050 >>> >>> - Trunk and branch-2 patches both change the protobuf type in the same >>> way. >>> >>> @@ -356,7 +416,22 @@ message ApplicationSubmissionContextProto { >>> >>> optional LogAggregationContextProto log_aggregation_context = 14; >>> >>> optional ReservationIdProto reservation_id = 15; >>> >>> optional string node_label_expression = 16; >>> >>> - optional ResourceRequestProto am_container_resource_request = 17; >>> >>> + repeated ResourceRequestProto am_container_resource_request = 17; >>> >>> + repeated ApplicationTimeoutMapProto application_timeouts = 18; >>> >>> >>> https://issues.apache.org/jira/browse/YARN-7813 >>> >>> - Trunk (applied through branch-3.1) and branch-3.0 (applied through >>> branch-2.9) patches differ from branch-2.8 patch >>> >>> @@ -425,7 +501,21 @@ message QueueInfoProto { >>> >>> optional string defaultNodeLabelExpression = 9; >>> >>> optional QueueStatisticsProto queueStatistics = 10; >>> >>> optional bool preemptionDisabled = 11; >>> >>> - optional bool intraQueuePreemptionDisabled = 12; >>> >>> + repeated QueueConfigurationsMapProto queueConfigurationsMap = 12; >>> >>> + optional bool intraQueuePreemptionDisabled = 13; >>> >>