[ https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049039#comment-15049039 ]
Junping Du commented on YARN-4356: ---------------------------------- Thanks [~sjlee0] for delivering the patch. 004 patch looks good to me in overall, but some minor comments: In JobHistoryEventHandler.java, {code} LOG.info("Emitting job history data to the timeline server is enabled"); {code} server => service, given we don't provide a centralized server in v2. {code} + LOG.info("Timeline service is enabled; version: " + + (timelineServiceV2Enabled? "v2" : "v1")); {code} Shall we get configuration version from configuration defined in YARN-3623? Especially, we have other versions, like: v1.5. TestMRTimelineEventHandling.java, {code} // enable new timeline serivce {code} typo: serivce => service YarnConfiguration.java, {code} @return whether the timelien service is enabled. {code} typo: timelien => timeline {code} + public static boolean timelineServiceV2Enabled(Configuration conf) { + return timelineServiceEnabled(conf) && getTimelineServiceVersion(conf) == 2; + } {code} Would it be possible to have ATS v2.5 in future? If so, may be we should cast float number get from version config before comparing with 2? In NodeHeartbeatRequestPBImpl.java, {code} + this.registeredCollectors = new HashMap<>(); Update new HashMap<>() => new HashMap<ApplicationId, String> () {code} and the same problem in NodeHeartbeatResponsePBImpl.java {code} this.appCollectorsMap = new HashMap<>(); {code} In NodeManager.java, {code} this.registeredCollectors = new ConcurrentHashMap<>(); {code} We should also add back types as requirement/convension for generics. {code} + if (this.registeredCollectors != null) { + this.registeredCollectors.putAll(newRegisteredCollectors); + } {code} This check of null is unnecessary as the only caller - NMCollectorService is only running under v2 is enabled. If for some reason, we get NPE here which is still better than we ignore it silently. In NodeStatusUpdaterImpl.java, {code} - /** - * Caller should take care of sending non null nodelabels for both - * arguments - * - * @param nodeLabelsNew - * @param nodeLabelsOld - * @return if the New node labels are diff from the older one. - */ - private boolean areNodeLabelsUpdated(Set<NodeLabel> nodeLabelsNew, - Set<NodeLabel> nodeLabelsOld) { - if (nodeLabelsNew.size() != nodeLabelsOld.size() - || !nodeLabelsOld.containsAll(nodeLabelsNew)) { - return true; - } - return false; - } {code} Please remove this unrelated change out for more focus and better tracking. {code} + !context.getRegisteredCollectors().containsKey(appId)) { {code} I think this logic could be problemtic if collector address get updated due to NM restart or collector service failure. However, this shouldn't be addressed in this JIRA. But kindly add a TODO would a good reminder. In ContainerManagerImpl.java, {code} + } else { + flowContext = null; } {code} this else branch is not necessary as we can define flowContext to be null at the beginning. In ResourceManager.java, {code} + if (version < 2 && ... + } else if (version == 2 && {code} Can we continuously to use YarnConfiguration.timelineServiceV2Enabled() in every place? Or we could miss these places if we need to change version logic in future. In ResourceTrackerService.java, {code} List<ApplicationId> keepAliveApps = remoteNodeStatus.getKeepAliveApplications(); - if (keepAliveApps != null) { + if (timelineV2Enabled && keepAliveApps != null) { {code} Just a reminder, keepAliveApps is a wrong list to identify running apps on specific node. YARN-3586 (with patch) is already filed to fix this. We can either merge that patch in or rebase that patch when this patch done. In TimelineServiceV2Publisher.java, {code} - * This class is responsible for posting application, appattempt & Container + * This class is responsible for posting application, appattempt & Container {code} Why we need this change? In PerNodeTimelineCollectorsAuxService.java, {code} + // enable timeline service v.2 + conf.setBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, true); + conf.setFloat(YarnConfiguration.TIMELINE_SERVICE_VERSION, 2.0f); {code} We should disable PerNodeTimelineCollectorsAuxService if we don't enable timeline service v2. Isn't it? If so, I think this is not a necessary change and we should remove. In TimelineReaderServer.java, {code} + YarnConfiguration conf = new YarnConfiguration(); + // enable timeline service v.2 + conf.setBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, true); + conf.setFloat(YarnConfiguration.TIMELINE_SERVICE_VERSION, 2.0f); {code} The same question with above. In addition, I think we should split the change that duplicated with YARN-3623 and cherry-pick it from trunk/branch-2 when that patch get commit in. > ensure the timeline service v.2 is disabled cleanly and has no impact when > it's turned off > ------------------------------------------------------------------------------------------ > > Key: YARN-4356 > URL: https://issues.apache.org/jira/browse/YARN-4356 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Affects Versions: YARN-2928 > Reporter: Sangjin Lee > Assignee: Sangjin Lee > Priority: Critical > Labels: yarn-2928-1st-milestone > Attachments: YARN-4356-feature-YARN-2928.002.patch, > YARN-4356-feature-YARN-2928.003.patch, YARN-4356-feature-YARN-2928.004.patch, > YARN-4356-feature-YARN-2928.poc.001.patch > > > For us to be able to merge the first milestone drop to trunk, we want to > ensure that once disabled the timeline service v.2 has no impact from the > server side to the client side. If the timeline service is not enabled, no > action should be done. If v.1 is enabled but not v.2, v.1 should behave the > same as it does before the merge. -- This message was sent by Atlassian JIRA (v6.3.4#6332)