[jira] [Resolved] (YARN-8849) DynoYARN: A simulation and testing infrastructure for YARN clusters
[ https://issues.apache.org/jira/browse/YARN-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung resolved YARN-8849. - Resolution: Fixed FYI we have open source DynoYARN on Github: https://github.com/linkedin/dynoyarn > DynoYARN: A simulation and testing infrastructure for YARN clusters > --- > > Key: YARN-8849 > URL: https://issues.apache.org/jira/browse/YARN-8849 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun Suresh > Assignee: Jonathan Hung >Priority: Major > > Traditionally, YARN workload simulation is performed using SLS (Scheduler > Load Simulator) which is packaged with YARN. It Essentially, starts a full > fledged *ResourceManager*, but runs simulators for the *NodeManager* and the > *ApplicationMaster* Containers. These simulators are lightweight and run in a > threadpool. The NM simulators do not open any external ports and send > (in-process) heartbeats to the ResourceManager. > There are a couple of drawbacks with using the SLS: > * It might be difficult to simulate really large clusters without having > access to a very beefy box - since the NMs are launched as tasks in a > threadpool, and each NM has to send periodic heartbeats to the RM. > * Certain features (like YARN-1011) requires changes to the NodeManager - > aspects such as queuing and selectively killing containers have to be > incorporated into the existing NM Simulator which might make the simulator a > bit heavy weight - there is a need for locking and synchronization. > * Since the NM and AM are simulations, only the Scheduler is faithfully > tested - it does not really perform an end-2-end test of a cluster. > Therefore, drawing inspiration from > [Dynamometer|https://github.com/linkedin/dynamometer], we propose a framework > for YARN deployable YARN cluster - *DynoYARN* - for testing, with the > following features: > * The NM already has hooks to plug-in custom *ContainerExecutor* and > *NodeResourceMonitor*. If we can plug-in a custom *ContainersMonitorImpl*'s > Monitoring thread (and other modules like the LocalizationService), We can > probably inject an Executor that does not actually launch containers and a > Node and Container resource monitor that reports synthetic pre-specified > Utilization metrics back to the RM. > * Since we are launching fake containers, we cannot run normal AM > containers. We can therefore, use *Unmanaged AM*'s to launch synthetic jobs. > Essentially, a test workflow would look like this: > * Launch a DynoYARN cluster. > * Use the Unmanaged AM feature to directly negotiate with the DynaYARN > Resource Manager for container tokens. > * Use the container tokens from the RM to directly ask the DynoYARN Node > Managers to start fake containers. > * The DynoYARN NodeManagers will start the fake containers and report to the > DynoYARN Resource Manager synthetically generated resource utilization for > the containers (which will be injected via the *ContainerLaunchContext* and > parsed by the plugged-in Container Executor). > * The Scheduler will use the utilization report to schedule containers - we > will be able to test allocation of *Opportunistic* containers based on > resource utilization. > * Since the DynoYARN Node Managers run the actual code paths, all preemption > and queuing logic will be faithfully executed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10297) TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails intermittently
Jonathan Hung created YARN-10297: Summary: TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails intermittently Key: YARN-10297 URL: https://issues.apache.org/jira/browse/YARN-10297 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Hung After YARN-6492, testFairSchedulerContinuousSchedulingInitTime fails intermittently. {noformat}[INFO] Running org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.682 s <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling [ERROR] testFairSchedulerContinuousSchedulingInitTime(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling) Time elapsed: 0.194 s <<< ERROR! org.apache.hadoop.metrics2.MetricsException: Metrics source PartitionQueueMetrics,partition= already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.getPartitionMetrics(QueueMetrics.java:362) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.incrPendingResources(QueueMetrics.java:601) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updatePendingResources(AppSchedulingInfo.java:388) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:320) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:347) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:183) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:456) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:898) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testFairSchedulerContinuousSchedulingInitTime(TestContinuousScheduling.java:375) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10263) Application summary is logged multiple times due to RM recovery
Jonathan Hung created YARN-10263: Summary: Application summary is logged multiple times due to RM recovery Key: YARN-10263 URL: https://issues.apache.org/jira/browse/YARN-10263 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Hung App finishes, and is logged to RM app summary. Restart RM. Then this app is logged to RM app summary again. We would somehow need to know cross-restart whether an app has been logged or not. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10260) Allow transitioning queue from DRAINING to RUNNING state
Jonathan Hung created YARN-10260: Summary: Allow transitioning queue from DRAINING to RUNNING state Key: YARN-10260 URL: https://issues.apache.org/jira/browse/YARN-10260 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Hung We found in our cluster, a queue was erroneously stopped. Then queue is internally in DRAINING state. It cannot be moved back to RUNNING state until the queue is finished draining. For queues with large workloads, this can block other apps from submitting to this queue for a long time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [DISCUSS] Making 2.10 the last minor 2.x release
Source code has been deleted from branch-2. Thanks Akira for taking this up! Jonathan Hung On Thu, Apr 16, 2020 at 11:40 AM Jonathan Hung wrote: > Makes sense. I've cherry-picked the commits in branch-2 that were missed > in branch-2.10. > > Jonathan Hung > > > On Wed, Apr 15, 2020 at 2:25 AM Akira Ajisaka wrote: > >> Hi folks, >> >> I am still seeing some changes are being committed to branch-2. >> I'd like to delete the source code from branch-2 to avoid mistakes. >> https://issues.apache.org/jira/browse/HADOOP-16988 >> >> -Akira >> >> On Wed, Jan 1, 2020 at 2:38 AM Ayush Saxena wrote: >> >>> Hi Jim, >>> Thanx for catching, I have configured the build to run on branch-2.10. >>> >>> -Ayush >>> >>> On Tue, 31 Dec 2019 at 22:50, Jim Brennan < >>> james.bren...@verizonmedia.com> wrote: >>> >>>> It looks like QBT tests are still being run on branch-2 ( >>>> https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-branch2-java7-linux-x86/), >>>> and they are not very helpful at this point. >>>> Can we change the QBT tests to run against branch-2.10 instead? >>>> >>>> Jim >>>> >>>> On Mon, Dec 23, 2019 at 7:44 PM Akira Ajisaka >>>> wrote: >>>> >>>>> Thank you, Ayush. >>>>> >>>>> I understand we should keep branch-2 as is, as well as master. >>>>> >>>>> -Akira >>>>> >>>>> >>>>> On Mon, Dec 23, 2019 at 9:14 PM Ayush Saxena >>>>> wrote: >>>>> >>>>> > Hi Akira >>>>> > Seems there was an INFRA ticket for that. INFRA-19581, >>>>> > But the INFRA people closed as wont do and yes, the branch is >>>>> protected, >>>>> > we can’t delete it directly. >>>>> > >>>>> > Ref: https://issues.apache.org/jira/browse/INFRA-19581 >>>>> > >>>>> > -Ayush >>>>> > >>>>> > On 23-Dec-2019, at 5:03 PM, Akira Ajisaka >>>>> wrote: >>>>> > >>>>> > Thank you for your work, Jonathan. >>>>> > >>>>> > I found branch-2 has been unintentionally pushed again. Would you >>>>> remove >>>>> > it? >>>>> > I think the branch should be protected if possible. >>>>> > >>>>> > -Akira >>>>> > >>>>> > On Tue, Dec 10, 2019 at 5:17 AM Jonathan Hung >>>>> > wrote: >>>>> > >>>>> > It's done. The new commit chain is: trunk -> branch-3.2 -> >>>>> branch-3.1 -> >>>>> > >>>>> > branch-2.10 -> branch-2.9 -> branch-2.8 (branch-2 no longer exists, >>>>> please >>>>> > >>>>> > don't try to commit to it) >>>>> > >>>>> > >>>>> > Completed procedure: >>>>> > >>>>> > >>>>> > - Verified everything in old branch-2.10 was in old branch-2 >>>>> > >>>>> > - Delete old branch-2.10 >>>>> > >>>>> > - Rename branch-2 to (new) branch-2.10 >>>>> > >>>>> > - Set version in new branch-2.10 to 2.10.1-SNAPSHOT >>>>> > >>>>> > - Renamed fix versions from 2.11.0 to 2.10.1 >>>>> > >>>>> > - Removed 2.11.0 as a version in HADOOP/YARN/HDFS/MAPREDUCE >>>>> > >>>>> > >>>>> > >>>>> > Jonathan Hung >>>>> > >>>>> > >>>>> > >>>>> > On Wed, Dec 4, 2019 at 10:55 AM Jonathan Hung >>>>> > >>>>> > wrote: >>>>> > >>>>> > >>>>> > FYI, starting the rename process, beginning with INFRA-19521. >>>>> > >>>>> > >>>>> > Jonathan Hung >>>>> > >>>>> > >>>>> > >>>>> > On Wed, Nov 27, 2019 at 12:15 PM Konstantin Shvachko < >>>>> > >>>>> > shv.had...@gmail.com> >>>>> > >>>>> > wrote: >>>>> > >>>>> > >>>>> &g
Re: [DISCUSS] Making 2.10 the last minor 2.x release
Makes sense. I've cherry-picked the commits in branch-2 that were missed in branch-2.10. Jonathan Hung On Wed, Apr 15, 2020 at 2:25 AM Akira Ajisaka wrote: > Hi folks, > > I am still seeing some changes are being committed to branch-2. > I'd like to delete the source code from branch-2 to avoid mistakes. > https://issues.apache.org/jira/browse/HADOOP-16988 > > -Akira > > On Wed, Jan 1, 2020 at 2:38 AM Ayush Saxena wrote: > >> Hi Jim, >> Thanx for catching, I have configured the build to run on branch-2.10. >> >> -Ayush >> >> On Tue, 31 Dec 2019 at 22:50, Jim Brennan >> wrote: >> >>> It looks like QBT tests are still being run on branch-2 ( >>> https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-branch2-java7-linux-x86/), >>> and they are not very helpful at this point. >>> Can we change the QBT tests to run against branch-2.10 instead? >>> >>> Jim >>> >>> On Mon, Dec 23, 2019 at 7:44 PM Akira Ajisaka >>> wrote: >>> >>>> Thank you, Ayush. >>>> >>>> I understand we should keep branch-2 as is, as well as master. >>>> >>>> -Akira >>>> >>>> >>>> On Mon, Dec 23, 2019 at 9:14 PM Ayush Saxena >>>> wrote: >>>> >>>> > Hi Akira >>>> > Seems there was an INFRA ticket for that. INFRA-19581, >>>> > But the INFRA people closed as wont do and yes, the branch is >>>> protected, >>>> > we can’t delete it directly. >>>> > >>>> > Ref: https://issues.apache.org/jira/browse/INFRA-19581 >>>> > >>>> > -Ayush >>>> > >>>> > On 23-Dec-2019, at 5:03 PM, Akira Ajisaka >>>> wrote: >>>> > >>>> > Thank you for your work, Jonathan. >>>> > >>>> > I found branch-2 has been unintentionally pushed again. Would you >>>> remove >>>> > it? >>>> > I think the branch should be protected if possible. >>>> > >>>> > -Akira >>>> > >>>> > On Tue, Dec 10, 2019 at 5:17 AM Jonathan Hung >>>> > wrote: >>>> > >>>> > It's done. The new commit chain is: trunk -> branch-3.2 -> branch-3.1 >>>> -> >>>> > >>>> > branch-2.10 -> branch-2.9 -> branch-2.8 (branch-2 no longer exists, >>>> please >>>> > >>>> > don't try to commit to it) >>>> > >>>> > >>>> > Completed procedure: >>>> > >>>> > >>>> > - Verified everything in old branch-2.10 was in old branch-2 >>>> > >>>> > - Delete old branch-2.10 >>>> > >>>> > - Rename branch-2 to (new) branch-2.10 >>>> > >>>> > - Set version in new branch-2.10 to 2.10.1-SNAPSHOT >>>> > >>>> > - Renamed fix versions from 2.11.0 to 2.10.1 >>>> > >>>> > - Removed 2.11.0 as a version in HADOOP/YARN/HDFS/MAPREDUCE >>>> > >>>> > >>>> > >>>> > Jonathan Hung >>>> > >>>> > >>>> > >>>> > On Wed, Dec 4, 2019 at 10:55 AM Jonathan Hung >>>> > >>>> > wrote: >>>> > >>>> > >>>> > FYI, starting the rename process, beginning with INFRA-19521. >>>> > >>>> > >>>> > Jonathan Hung >>>> > >>>> > >>>> > >>>> > On Wed, Nov 27, 2019 at 12:15 PM Konstantin Shvachko < >>>> > >>>> > shv.had...@gmail.com> >>>> > >>>> > wrote: >>>> > >>>> > >>>> > Hey guys, >>>> > >>>> > >>>> > I think we diverged a bit from the initial topic of this discussion, >>>> > >>>> > which is removing branch-2.10, and changing the version of branch-2 >>>> from >>>> > >>>> > 2.11.0-SNAPSHOT to 2.10.1-SNAPSHOT. >>>> > >>>> > Sounds like the subject line for this thread "Making 2.10 the last >>>> minor >>>> > >>>> > 2.x release" confused people. >>>> > >>>> > It is in fact a wider matter that can be di
[jira] [Created] (YARN-10212) Create separate configuration for max global AM attempts
Jonathan Hung created YARN-10212: Summary: Create separate configuration for max global AM attempts Key: YARN-10212 URL: https://issues.apache.org/jira/browse/YARN-10212 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Hung Right now user's default max AM attempts is set to the same as global max AM attempts: {noformat} int globalMaxAppAttempts = conf.getInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS, YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS); {noformat} If we want to increase global max AM attempts, it will also increase the default. So we should create a separate global AM max attempts config to separate the two. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10200) Add number of containers to RMAppManager summary
Jonathan Hung created YARN-10200: Summary: Add number of containers to RMAppManager summary Key: YARN-10200 URL: https://issues.apache.org/jira/browse/YARN-10200 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Hung We track number of containers per app, it would be useful to persist this so we can track long-term containers processed by RM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10192) CapacityScheduler stuck in loop rejecting allocation proposals
Jonathan Hung created YARN-10192: Summary: CapacityScheduler stuck in loop rejecting allocation proposals Key: YARN-10192 URL: https://issues.apache.org/jira/browse/YARN-10192 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Hung On a 2.10.0 cluster, we observed containers were being scheduled very slowly. Based on logs, it seems to reject a bunch of allocation proposals, then accept a bunch of reserved containers, but very few containers are actually getting allocated: {noformat} 2020-03-10 06:31:48,965 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.30113637 absoluteUsedCapacity=0.30113637 used= cluster= 2020-03-10 06:31:48,965 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Failed to accept allocation proposal 2020-03-10 06:31:48,965 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: assignedContainer application attempt=appattempt_1582403122262_15460_01 container=null queue=misc_default clusterResource= type=OFF_SWITCH requestedPartition=cpu 2020-03-10 06:31:48,965 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=misc usedCapacity=0.0031771248 absoluteUsedCapacity=3.1771246E-4 used= cluster= 2020-03-10 06:31:48,965 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.30113637 absoluteUsedCapacity=0.30113637 used= cluster= 2020-03-10 06:31:48,965 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Failed to accept allocation proposal 2020-03-10 06:31:48,968 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: assignedContainer application attempt=appattempt_1582403122262_15460_01 container=null queue=misc_default clusterResource= type=OFF_SWITCH requestedPartition=cpu 2020-03-10 06:31:48,968 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=misc usedCapacity=0.0031771248 absoluteUsedCapacity=3.1771246E-4 used= cluster= 2020-03-10 06:31:48,968 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.30113637 absoluteUsedCapacity=0.30113637 used= cluster= 2020-03-10 06:31:48,968 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Failed to accept allocation proposal 2020-03-10 06:31:48,977 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: assignedContainer application attempt=appattempt_1582403122262_15460_01 container=null queue=misc_default clusterResource= type=OFF_SWITCH requestedPartition=cpu 2020-03-10 06:31:48,977 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=misc usedCapacity=0.0031771248 absoluteUsedCapacity=3.1771246E-4 used= cluster= 2020-03-10 06:31:48,977 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.30113637 absoluteUsedCapacity=0.30113637 used= cluster= 2020-03-10 06:31:48,977 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Failed to accept allocation proposal 2020-03-10 06:31:48,981 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: assignedContainer application attempt=appattempt_1582403122262_15460_01 container=null queue=misc_default clusterResource= type=OFF_SWITCH requestedPartition=cpu 2020-03-10 06:31:48,982 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=misc usedCapacity=0.0031771248 absoluteUsedCapacity=3.1771246E-4 used= cluster= 2020-03-10 06:31:48,982 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.30113637 absoluteUsedCapacity=0.30113637 used= cluster= 2020-03-10 06:31:48,982 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Failed to accept allocation proposal 2020-03-10 06:31:48,985 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: assignedContainer application attempt=appattempt_1582403122262_15460_01 container=null queue=misc_default clusterResource= type=OFF_SWITCH requestedPartition=cpu 2020-03-10 06:31:48,985 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=misc usedCapacity=0.0031771248 absoluteUsedCapacity=3.1771246E-4 used
[jira] [Created] (YARN-10134) Periodically sync backend scheduler configuration changes to capacity-scheduler.xml
Jonathan Hung created YARN-10134: Summary: Periodically sync backend scheduler configuration changes to capacity-scheduler.xml Key: YARN-10134 URL: https://issues.apache.org/jira/browse/YARN-10134 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Hung In case backend scheduler configuration changes are lost, it'd be good to have a relatively up-to-date configuration in capacity-scheduler.xml. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10116) Expose diagnostics in RMAppManager summary
Jonathan Hung created YARN-10116: Summary: Expose diagnostics in RMAppManager summary Key: YARN-10116 URL: https://issues.apache.org/jira/browse/YARN-10116 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Hung Assignee: Jonathan Hung It's useful for tracking app diagnostics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10039) Allow disabling app submission from REST endpoints
Jonathan Hung created YARN-10039: Summary: Allow disabling app submission from REST endpoints Key: YARN-10039 URL: https://issues.apache.org/jira/browse/YARN-10039 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Hung Introduce a configuration which allows disabling /apps/new-application and /apps POST endpoints. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [DISCUSS] Making 2.10 the last minor 2.x release
It's done. The new commit chain is: trunk -> branch-3.2 -> branch-3.1 -> branch-2.10 -> branch-2.9 -> branch-2.8 (branch-2 no longer exists, please don't try to commit to it) Completed procedure: - Verified everything in old branch-2.10 was in old branch-2 - Delete old branch-2.10 - Rename branch-2 to (new) branch-2.10 - Set version in new branch-2.10 to 2.10.1-SNAPSHOT - Renamed fix versions from 2.11.0 to 2.10.1 - Removed 2.11.0 as a version in HADOOP/YARN/HDFS/MAPREDUCE Jonathan Hung On Wed, Dec 4, 2019 at 10:55 AM Jonathan Hung wrote: > FYI, starting the rename process, beginning with INFRA-19521. > > Jonathan Hung > > > On Wed, Nov 27, 2019 at 12:15 PM Konstantin Shvachko > wrote: > >> Hey guys, >> >> I think we diverged a bit from the initial topic of this discussion, >> which is removing branch-2.10, and changing the version of branch-2 from >> 2.11.0-SNAPSHOT to 2.10.1-SNAPSHOT. >> Sounds like the subject line for this thread "Making 2.10 the last minor >> 2.x release" confused people. >> It is in fact a wider matter that can be discussed when somebody actually >> proposes to release 2.11, which I understand nobody does at the moment. >> >> So if anybody objects removing branch-2.10 please make an argument. >> Otherwise we should go ahead and just do it next week. >> I see people still struggling to keep branch-2 and branch-2.10 in sync. >> >> Thanks, >> --Konstantin >> >> On Thu, Nov 21, 2019 at 3:49 PM Jonathan Hung >> wrote: >> >>> Thanks for the detailed thoughts, everyone. >>> >>> Eric (Badger), my understanding is the same as yours re. minor vs patch >>> releases. As for putting features into minor/patch releases, if we keep the >>> convention of putting new features only into minor releases, my assumption >>> is still that it's unlikely people will want to get them into branch-2 >>> (based on the 2.10.0 release process). For the java 11 issue, we haven't >>> even really removed support for java 7 in branch-2 (much less java 8), so I >>> feel moving to java 11 would go along with a move to branch 3. And as you >>> mentioned, if people really want to use java 11 on branch-2, we can always >>> revive branch-2. But for now I think the convenience of not needing to port >>> to both branch-2 and branch-2.10 (and below) outweighs the cost of >>> potentially needing to revive branch-2. >>> >>> Jonathan Hung >>> >>> >>> On Wed, Nov 20, 2019 at 10:50 AM Eric Yang wrote: >>> >>>> +1 for 2.10.x as last release for 2.x version. >>>> >>>> Software would become more compatible when more companies stress test >>>> the same software and making improvements in trunk. Some may be extra >>>> caution on moving up the version because obligation internally to keep >>>> things running. Company obligation should not be the driving force to >>>> maintain Hadoop branches. There is no proper collaboration in the >>>> community when every name brand company maintains its own Hadoop 2.x >>>> version. I think it would be more healthy for the community to reduce the >>>> branch forking and spend energy on trunk to harden the software. This will >>>> give more confidence to move up the version than trying to fix n >>>> permutations breakage like Flash fixing the timeline. >>>> >>>> Apache license stated, there is no warranty of any kind for code >>>> contributions. Fewer community release process should improve software >>>> quality when eyes are on trunk, and help steering toward the same end >>>> goals. >>>> >>>> regards, >>>> Eric >>>> >>>> >>>> >>>> On Tue, Nov 19, 2019 at 3:03 PM Eric Badger >>>> wrote: >>>> >>>>> Hello all, >>>>> >>>>> Is it written anywhere what the difference is between a minor release >>>>> and a >>>>> point/dot/maintenance (I'll use "point" from here on out) release? I >>>>> have >>>>> looked around and I can't find anything other than some compatibility >>>>> documentation in 2.x that has since been removed in 3.x [1] [2]. I >>>>> think >>>>> this would help shape my opinion on whether or not to keep branch-2 >>>>> alive. >>>>> My current understanding is that we can't really break compatibility i
Re: [DISCUSS] Making 2.10 the last minor 2.x release
FYI, starting the rename process, beginning with INFRA-19521. Jonathan Hung On Wed, Nov 27, 2019 at 12:15 PM Konstantin Shvachko wrote: > Hey guys, > > I think we diverged a bit from the initial topic of this discussion, which > is removing branch-2.10, and changing the version of branch-2 from > 2.11.0-SNAPSHOT to 2.10.1-SNAPSHOT. > Sounds like the subject line for this thread "Making 2.10 the last minor > 2.x release" confused people. > It is in fact a wider matter that can be discussed when somebody actually > proposes to release 2.11, which I understand nobody does at the moment. > > So if anybody objects removing branch-2.10 please make an argument. > Otherwise we should go ahead and just do it next week. > I see people still struggling to keep branch-2 and branch-2.10 in sync. > > Thanks, > --Konstantin > > On Thu, Nov 21, 2019 at 3:49 PM Jonathan Hung > wrote: > >> Thanks for the detailed thoughts, everyone. >> >> Eric (Badger), my understanding is the same as yours re. minor vs patch >> releases. As for putting features into minor/patch releases, if we keep the >> convention of putting new features only into minor releases, my assumption >> is still that it's unlikely people will want to get them into branch-2 >> (based on the 2.10.0 release process). For the java 11 issue, we haven't >> even really removed support for java 7 in branch-2 (much less java 8), so I >> feel moving to java 11 would go along with a move to branch 3. And as you >> mentioned, if people really want to use java 11 on branch-2, we can always >> revive branch-2. But for now I think the convenience of not needing to port >> to both branch-2 and branch-2.10 (and below) outweighs the cost of >> potentially needing to revive branch-2. >> >> Jonathan Hung >> >> >> On Wed, Nov 20, 2019 at 10:50 AM Eric Yang wrote: >> >>> +1 for 2.10.x as last release for 2.x version. >>> >>> Software would become more compatible when more companies stress test >>> the same software and making improvements in trunk. Some may be extra >>> caution on moving up the version because obligation internally to keep >>> things running. Company obligation should not be the driving force to >>> maintain Hadoop branches. There is no proper collaboration in the >>> community when every name brand company maintains its own Hadoop 2.x >>> version. I think it would be more healthy for the community to reduce the >>> branch forking and spend energy on trunk to harden the software. This will >>> give more confidence to move up the version than trying to fix n >>> permutations breakage like Flash fixing the timeline. >>> >>> Apache license stated, there is no warranty of any kind for code >>> contributions. Fewer community release process should improve software >>> quality when eyes are on trunk, and help steering toward the same end goals. >>> >>> regards, >>> Eric >>> >>> >>> >>> On Tue, Nov 19, 2019 at 3:03 PM Eric Badger >>> wrote: >>> >>>> Hello all, >>>> >>>> Is it written anywhere what the difference is between a minor release >>>> and a >>>> point/dot/maintenance (I'll use "point" from here on out) release? I >>>> have >>>> looked around and I can't find anything other than some compatibility >>>> documentation in 2.x that has since been removed in 3.x [1] [2]. I think >>>> this would help shape my opinion on whether or not to keep branch-2 >>>> alive. >>>> My current understanding is that we can't really break compatibility in >>>> either a minor or point release. But the only mention of the difference >>>> between minor and point releases is how to deal with Stable, Evolving, >>>> and >>>> Unstable tags, and how to deal with changing default configuration >>>> values. >>>> So it seems like there really isn't a big official difference between >>>> the >>>> two. In my mind, the functional difference between the two is that the >>>> minor releases may have added features and rewrites, while the point >>>> releases only have bug fixes. This might be an incorrect understanding, >>>> but >>>> that's what I have gathered from watching the releases over the last few >>>> years. Whether or not this is a correct understanding, I think that this >>>> needs to be documented somewhere, even if it is just a convent
[jira] [Created] (YARN-10012) Guaranteed and max capacity queue metrics for custom resources
Jonathan Hung created YARN-10012: Summary: Guaranteed and max capacity queue metrics for custom resources Key: YARN-10012 URL: https://issues.apache.org/jira/browse/YARN-10012 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Hung YARN-9085 adds support for guaranteed/maxcapacity MB/vcores. We should add the same for custom resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9992) Max allocation per queue is zero for custom resource types on RM startup
Jonathan Hung created YARN-9992: --- Summary: Max allocation per queue is zero for custom resource types on RM startup Key: YARN-9992 URL: https://issues.apache.org/jira/browse/YARN-9992 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Hung Found an issue where trying to request GPUs on a newly booted RM cannot schedule. It throws the exception in SchedulerUtils#throwInvalidResourceException: {noformat} throw new InvalidResourceRequestException( "Invalid resource request, requested resource type=[" + reqResourceName + "] < 0 or greater than maximum allowed allocation. Requested " + "resource=" + reqResource + ", maximum allowed allocation=" + availableResource + ", please note that maximum allowed allocation is calculated " + "by scheduler based on maximum resource of registered " + "NodeManagers, which might be less than configured " + "maximum allocation=" + ResourceUtils.getResourceTypesMaximumAllocation());{noformat} Upon refreshing scheduler (e.g. via refreshQueues), GPU scheduling works again. I think the RC is that upon scheduler refresh, resource-types.xml is loaded in CapacitySchedulerConfiguration (as part of YARN-7738), so when we call ResourceUtils#fetchMaximumAllocationFromConfig in CapacitySchedulerConfiguration#getMaximumAllocationPerQueue, it's able to fetch the {{yarn.resource-types}} config. But resource-types.xml is not loaded into the conf in CapacityScheduler#initScheduler, so it doesn't find the custom resource when computing max allocations, and the custom resource max allocation is 0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [DISCUSS] Making 2.10 the last minor 2.x release
Thanks for the detailed thoughts, everyone. Eric (Badger), my understanding is the same as yours re. minor vs patch releases. As for putting features into minor/patch releases, if we keep the convention of putting new features only into minor releases, my assumption is still that it's unlikely people will want to get them into branch-2 (based on the 2.10.0 release process). For the java 11 issue, we haven't even really removed support for java 7 in branch-2 (much less java 8), so I feel moving to java 11 would go along with a move to branch 3. And as you mentioned, if people really want to use java 11 on branch-2, we can always revive branch-2. But for now I think the convenience of not needing to port to both branch-2 and branch-2.10 (and below) outweighs the cost of potentially needing to revive branch-2. Jonathan Hung On Wed, Nov 20, 2019 at 10:50 AM Eric Yang wrote: > +1 for 2.10.x as last release for 2.x version. > > Software would become more compatible when more companies stress test the > same software and making improvements in trunk. Some may be extra caution > on moving up the version because obligation internally to keep things > running. Company obligation should not be the driving force to maintain > Hadoop branches. There is no proper collaboration in the community when > every name brand company maintains its own Hadoop 2.x version. I think it > would be more healthy for the community to reduce the branch forking and > spend energy on trunk to harden the software. This will give more > confidence to move up the version than trying to fix n permutations > breakage like Flash fixing the timeline. > > Apache license stated, there is no warranty of any kind for code > contributions. Fewer community release process should improve software > quality when eyes are on trunk, and help steering toward the same end goals. > > regards, > Eric > > > > On Tue, Nov 19, 2019 at 3:03 PM Eric Badger > wrote: > >> Hello all, >> >> Is it written anywhere what the difference is between a minor release and >> a >> point/dot/maintenance (I'll use "point" from here on out) release? I have >> looked around and I can't find anything other than some compatibility >> documentation in 2.x that has since been removed in 3.x [1] [2]. I think >> this would help shape my opinion on whether or not to keep branch-2 alive. >> My current understanding is that we can't really break compatibility in >> either a minor or point release. But the only mention of the difference >> between minor and point releases is how to deal with Stable, Evolving, and >> Unstable tags, and how to deal with changing default configuration values. >> So it seems like there really isn't a big official difference between the >> two. In my mind, the functional difference between the two is that the >> minor releases may have added features and rewrites, while the point >> releases only have bug fixes. This might be an incorrect understanding, >> but >> that's what I have gathered from watching the releases over the last few >> years. Whether or not this is a correct understanding, I think that this >> needs to be documented somewhere, even if it is just a convention. >> >> Given my assumed understanding of minor vs point releases, here are the >> pros/cons that I can think of for having a branch-2. Please add on or >> correct me for anything you feel is missing or inadequate. >> Pros: >> - Features/rewrites/higher-risk patches are less likely to be put into >> 2.10.x >> - It is less necessary to move to 3.x >> >> Cons: >> - Bug fixes are less likely to be put into 2.10.x >> - An extra branch to maintain >> - Committers have an extra branch (5 vs 4 total branches) to commit >> patches to if they should go all the way back to 2.10.x >> - It is less necessary to move to 3.x >> >> So on the one hand you get added stability in fewer features being >> committed to 2.10.x, but then on the other you get fewer bug fixes being >> committed. In a perfect world, we wouldn't have to make this tradeoff. But >> we don't live in a perfect world and committers will make mistakes either >> because of lack of knowledge or simply because they made a mistake. If we >> have a branch-2, committers will forget, not know to, or choose not to >> (for >> whatever reason) commit valid bug fixes back all the way to branch-2.10. >> If >> we don't have a branch-2, committers who want their borderline risky >> feature in the 2.x line will err on the side of putting it into >> branch-2.10 >> instead of proposing the creation of a branch-2. Cle
Re: [DISCUSS] Making 2.10 the last minor 2.x release
Thanks Eric for the comments - regarding your concerns, I feel the pros outweigh the cons. To me, the chances of patch releases on 2.10.x are much higher than a new 2.11 minor release. (There didn't seem to be many people outside of our company who expressed interest in getting new features to branch-2 prior to the 2.10.0 release.) Even now, a few weeks after 2.10.0 release, there's 29 patches that have gone into branch-2 and 9 in branch-2.10, so it's already diverged quite a bit. In any case, we can always reverse this decision if we really need to, by recreating branch-2. But this proposal would reduce a lot of confusion IMO. Jonathan Hung On Fri, Nov 15, 2019 at 11:41 AM epa...@apache.org wrote: > Thanks Jonathan for opening the discussion. > > I am not in favor of this proposal. 2.10 was very recently released, and > moving to 2.10 will take some time for the community. It seems premature to > make a decision at this point that there will never be a need for a 2.11 > release. > > -Eric > > > On Thursday, November 14, 2019, 8:51:59 PM CST, Jonathan Hung < > jyhung2...@gmail.com> wrote: > > Hi folks, > > Given the release of 2.10.0, and the fact that it's intended to be a bridge > release to Hadoop 3.x [1], I'm proposing we make 2.10.x the last minor > release line in branch-2. Currently, the main issue is that there's many > fixes going into branch-2 (the theoretical 2.11.0) that's not going into > branch-2.10 (which will become 2.10.1), so the fixes in branch-2 will > likely never see the light of day unless they are backported to > branch-2.10. > > To do this, I propose we: > > - Delete branch-2.10 > - Rename branch-2 to branch-2.10 > - Set version in the new branch-2.10 to 2.10.1-SNAPSHOT > > This way we get all the current branch-2 fixes into the 2.10.x release > line. Then the commit chain will look like: trunk -> branch-3.2 -> > branch-3.1 -> branch-2.10 -> branch-2.9 -> branch-2.8 > > Thoughts? > > Jonathan Hung > > [1] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg29479.html >
Re: [DISCUSS] Making 2.10 the last minor 2.x release
Some other additional items we would need: - Mark all fix-versions in YARN/HDFS/MAPREDUCE/HADOOP from 2.11.0 to 2.10.1 - Remove 2.11.0 as a version in these projects Jonathan Hung On Thu, Nov 14, 2019 at 6:51 PM Jonathan Hung wrote: > Hi folks, > > Given the release of 2.10.0, and the fact that it's intended to be a > bridge release to Hadoop 3.x [1], I'm proposing we make 2.10.x the last > minor release line in branch-2. Currently, the main issue is that there's > many fixes going into branch-2 (the theoretical 2.11.0) that's not going > into branch-2.10 (which will become 2.10.1), so the fixes in branch-2 will > likely never see the light of day unless they are backported to branch-2.10. > > To do this, I propose we: > >- Delete branch-2.10 >- Rename branch-2 to branch-2.10 >- Set version in the new branch-2.10 to 2.10.1-SNAPSHOT > > This way we get all the current branch-2 fixes into the 2.10.x release > line. Then the commit chain will look like: trunk -> branch-3.2 -> > branch-3.1 -> branch-2.10 -> branch-2.9 -> branch-2.8 > > Thoughts? > > Jonathan Hung > > [1] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg29479.html >
[DISCUSS] Making 2.10 the last minor 2.x release
Hi folks, Given the release of 2.10.0, and the fact that it's intended to be a bridge release to Hadoop 3.x [1], I'm proposing we make 2.10.x the last minor release line in branch-2. Currently, the main issue is that there's many fixes going into branch-2 (the theoretical 2.11.0) that's not going into branch-2.10 (which will become 2.10.1), so the fixes in branch-2 will likely never see the light of day unless they are backported to branch-2.10. To do this, I propose we: - Delete branch-2.10 - Rename branch-2 to branch-2.10 - Set version in the new branch-2.10 to 2.10.1-SNAPSHOT This way we get all the current branch-2 fixes into the 2.10.x release line. Then the commit chain will look like: trunk -> branch-3.2 -> branch-3.1 -> branch-2.10 -> branch-2.9 -> branch-2.8 Thoughts? Jonathan Hung [1] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg29479.html
[jira] [Created] (YARN-9964) Queue metrics turn negative when relabeling a node with running containers to default partition
Jonathan Hung created YARN-9964: --- Summary: Queue metrics turn negative when relabeling a node with running containers to default partition Key: YARN-9964 URL: https://issues.apache.org/jira/browse/YARN-9964 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Hung YARN-6467 changed queue metrics logic to only update certain metrics if it's for default partition. But if an app runs containers in a labeled node, then this node is moved to default partition, then the container is released, this container's resource won't increment queue's allocated resource, but will decrement. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9954) Configurable max application tags and max tag length
Jonathan Hung created YARN-9954: --- Summary: Configurable max application tags and max tag length Key: YARN-9954 URL: https://issues.apache.org/jira/browse/YARN-9954 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Hung Currently max tags and max tag length is hardcoded, it should be configurable {noformat} @Evolving public static final int APPLICATION_MAX_TAGS = 10; @Evolving public static final int APPLICATION_MAX_TAG_LENGTH = 100; {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[ANNOUNCE] Apache Hadoop 2.10.0 release
Hi all, I am happy to announce that the Apache Hadoop 2.10.0 has been released. Apache Hadoop 2.10.0 is the first release in the Apache Hadoop 2.10 line. The release details, including links to downloads, list of major features, release notes, and changelog, are on the 2.10.0 announcement page [1]. You can also download the release from the Downloads page [2]. - Major features: https://hadoop.apache.org/docs/r2.10.0/index.html - Release notes: http://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/release/2.10.0/RELEASENOTES.2.10.0.html - Changelog: http://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/release/2.10.0/CHANGES.2.10.0.html Thanks! [1] https://hadoop.apache.org/release/2.10.0.html [2] https://hadoop.apache.org/releases.html Jonathan
[jira] [Created] (YARN-9945) Fix javadoc in FederationProxyProviderUtil in branch-2
Jonathan Hung created YARN-9945: --- Summary: Fix javadoc in FederationProxyProviderUtil in branch-2 Key: YARN-9945 URL: https://issues.apache.org/jira/browse/YARN-9945 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Hung Assignee: Jonathan Hung {noformat} [ERROR] /home/jhung/hadoop-mp/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/failover/FederationProxyProviderUtil.java:83: error: reference not found [ERROR] * @param configuration Configuration to generate {@link ClientRMProxy} {noformat} This import was removed in branch-2 but it's referenced in this file's javadocs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [VOTE] Release Apache Hadoop 2.10.0 (RC1)
+1 from me too. The vote passed, so I'll continue with the rest of the release. Thanks everyone! Jonathan Hung On Tue, Oct 29, 2019 at 1:40 PM Giovanni Matteo Fumarola < giovanni.fumar...@gmail.com> wrote: > +1 (non-binding). > > - Built from source on Ubuntu with OpenJDK 11.0.3 > - Verified signatures > - Verified documentation > - Setup up a single node cluster and ran basic yarn commands > - Ran UTs for Yarn Router, Yarn Common, Yarn API, YARN NM and YARN RM. > > Thanks for putting this together, Jonathan. > > On Tue, Oct 29, 2019 at 8:47 AM Dinesh Chitlangia > wrote: > >> +1 (non-binding) >> >> - Verified signatures >> - Verified documentation >> - Built from sources on CentOS 7 >> - Tested with basic hdfs commands on a single node setup. >> >> Thank for organizing the release, Jonathan. >> >> -Dinesh >> >> >> >> On Tue, Oct 29, 2019 at 9:45 AM epa...@apache.org >> wrote: >> >> > Compatibility testing has gone well for me. >> > >> > - In a 4-node cluster, I ran YARN rolling upgrade tests between 2.8.5 >> and >> > 2.10.0 >> > - In a 4-node cluster, I ran YARN rolling upgrade tests between 2.10.0 >> and >> > trunk >> > - With one 4-node cluster running 2.10.0 and one 4-node cluster running >> > trunk, I ran a word count job in each cluster whose inputs and outputs >> were >> > from and to the opposite cluster. >> > - I verified that HDFS replication works as expected in a trunk cluster >> > that has one 2.10.0 datanode. >> > >> > Thanks, >> > -Eric >> > >> > >> > > On Tuesday, October 22, 2019, 4:55:29 PM CDT, Jonathan Hung < >> > jyhung2...@gmail.com> wrote: >> > > Hi folks, >> > > >> > >This is the second release candidate for the first release of Apache >> > Hadoop >> > >2.10 line. It contains 362 fixes/improvements since 2.9 [1]. It >> includes >> > >features such as: >> > > >> > > - User-defined resource types >> > > - Native GPU support as a schedulable resource type >> > > - Consistent reads from standby node >> > > - Namenode port based selective encryption >> > > - Improvements related to rolling upgrade support from 2.x to 3.x >> > > - Cost based fair call queue >> > > >> > > The RC1 artifacts are at: >> > http://home.apache.org/~jhung/hadoop-2.10.0-RC1/ >> > > >> > > RC tag is release-2.10.0-RC1. >> > > >> > > The maven artifacts are hosted here: >> > > >> https://repository.apache.org/content/repositories/orgapachehadoop-1243/ >> > > >> > > My public key is available here: >> > > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS >> > > >> > > The vote will run for 5 weekdays, until Tuesday, October 29 at 3:00 pm >> > PDT. >> > > >> > > Thanks, >> > > Jonathan Hung >> > >> > >> > - >> > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org >> > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org >> > >> > >> >
Re: [VOTE] Release Apache Hadoop 2.10.0 (RC0)
Thanks Eric! I sent out an RC1 earlier last week, not sure if you saw that. The only diff between RC1 and RC0 is HDFS-14667. If RC1 looks good to you then it'd be great to get your testing results on that thread. Jonathan Hung On Mon, Oct 28, 2019 at 1:06 PM epa...@apache.org wrote: > Compatibility testing has gone well for me. > > - In a 4-node cluster, I ran YARN rolling upgrade tests between 2.8.5 and > 2.10.0 > - In a 4-node cluster, I ran YARN rolling upgrade tests between 2.10.0 and > trunk > - With one 4-node cluster running 2.10.0 and one 4-node cluster running > trunk, I ran a word count job in each cluster whose inputs and outputs were > from and to the opposite cluster. > - I verified that HDFS replication works as expected in a trunk cluster > that has one 2.10.0 datanode. > > Thanks, > -Eric > > On Tuesday, October 22, 2019, 8:39:38 PM CDT, Jonathan Hung < > jyhung2...@gmail.com> wrote: > > > > > > Hi Eric, we've run some basic HDFS commands with a 3.2.1 namenode and > 2.10.0 clients and datanodes. Everything worked as expected. > > Jonathan Hung > > > On Tue, Oct 22, 2019 at 3:04 PM Eric Badger > wrote: > > > Hi Jonathan, > > > > Thanks for putting this RC together. You stated that there are > > improvements related to rolling upgrades from 2.x to 3.x and I know I > have > > seen multiple JIRAs getting committed to that effect. Could you describe > > any tests that you have done to verify rolling upgrade compatibility > > for 3.x servers talking to 2.x clients and vice versa? > > > > Thanks, > > > > Eric > > > > On Tue, Oct 22, 2019 at 1:49 PM Jonathan Hung > > wrote: > > > >> Thanks Konstantin and Zhankun. Unfortunately a feature slipped our radar > >> (HDFS-14667). Since this is the first of a minor release, we would like > to > >> get it into 2.10.0. > >> > >> HDFS-14667 has been committed to branch-2.10.0, I will be rolling an RC1 > >> shortly. > >> > >> Jonathan Hung > >> > >> > >> On Tue, Oct 22, 2019 at 1:39 AM Zhankun Tang wrote: > >> > >> > Thanks for the effort, Jonathan! > >> > > >> > +1 (non-binding) on RC0. > >> > - Set up a single node cluster with the binary tarball > >> > - Run Spark Pi and pySpark job > >> > > >> > BR, > >> > Zhankun > >> > > >> > On Tue, 22 Oct 2019 at 14:31, Konstantin Shvachko < > shv.had...@gmail.com > >> > > >> > wrote: > >> > > >> >> +1 on RC0. > >> >> - Verified signatures > >> >> - Built from sources > >> >> - Ran unit tests for new features > >> >> - Checked artifacts on Nexus, made sure the sources are present. > >> >> > >> >> Thanks > >> >> --Konstantin > >> >> > >> >> > >> >> On Wed, Oct 16, 2019 at 6:01 PM Jonathan Hung > >> >> wrote: > >> >> > >> >> > Hi folks, > >> >> > > >> >> > This is the first release candidate for the first release of Apache > >> >> Hadoop > >> >> > 2.10 line. It contains 361 fixes/improvements since 2.9 [1]. It > >> includes > >> >> > features such as: > >> >> > > >> >> > - User-defined resource types > >> >> > - Native GPU support as a schedulable resource type > >> >> > - Consistent reads from standby node > >> >> > - Namenode port based selective encryption > >> >> > - Improvements related to rolling upgrade support from 2.x to 3.x > >> >> > > >> >> > The RC0 artifacts are at: > >> >> http://home.apache.org/~jhung/hadoop-2.10.0-RC0/ > >> >> > > >> >> > RC tag is release-2.10.0-RC0. > >> >> > > >> >> > The maven artifacts are hosted here: > >> >> > > >> >> > >> > https://repository.apache.org/content/repositories/orgapachehadoop-1241/ > >> >> > > >> >> > My public key is available here: > >> >> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS > >> >> > > >> >> > The vote will run for 5 weekdays, until Wednesday, October 23 at > >> 6:00 pm > >> >> > PDT. > >> >> > > >> >> > Thanks, > >> >> > Jonathan Hung > >> >> > > >> >> > [1] > >> >> > > >> >> > > >> >> > >> > https://issues.apache.org/jira/issues/?jql=project%20in%20(HDFS%2C%20YARN%2C%20HADOOP%2C%20MAPREDUCE)%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%202.10.0%20AND%20fixVersion%20not%20in%20(2.9.2%2C%202.9.1%2C%202.9.0) > >> >> > > >> >> > >> > > >> > > > > - > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org > >
Re: [VOTE] Release Apache Hadoop 2.10.0 (RC0)
Hi Eric, I took a quick look, are you using mapreduce.application.framework.path to run your MR jobs? If not, this seems like expected behavior if AM and tasks get launched on different NMs with different locally installed hadoop versions? Jonathan Hung On Sat, Oct 26, 2019 at 8:55 AM epa...@apache.org wrote: > I ran a few compatibility tests between 2.10.0 and 3.3.0 (trunk) > > Unfortunately, I ran into the following problem: > > Running with 2.10 RM and 3.3.0 (trunk) NM fails attempts with the > following error: > > 2019-10-26 15:44:06,885 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RPC$VersionMismatch): > Protocol org.apache.hadoop.mapred.TaskUmbilicalProtocol version mismatch. > (client = 19, server = 21) > > The AM happened to launch on the 3.3.0 node. > > Is this a protobuf issue? I thought we addressed that? > > -Eric Payne > > > > On Tuesday, October 22, 2019, 8:39:38 PM CDT, Jonathan Hung < > jyhung2...@gmail.com> wrote: > > > > > > Hi Eric, we've run some basic HDFS commands with a 3.2.1 namenode and > 2.10.0 clients and datanodes. Everything worked as expected. > > Jonathan Hung > > > On Tue, Oct 22, 2019 at 3:04 PM Eric Badger > wrote: > > > Hi Jonathan, > > > > Thanks for putting this RC together. You stated that there are > > improvements related to rolling upgrades from 2.x to 3.x and I know I > have > > seen multiple JIRAs getting committed to that effect. Could you describe > > any tests that you have done to verify rolling upgrade compatibility > > for 3.x servers talking to 2.x clients and vice versa? > > > > Thanks, > > > > Eric > > > > On Tue, Oct 22, 2019 at 1:49 PM Jonathan Hung > > wrote: > > > >> Thanks Konstantin and Zhankun. Unfortunately a feature slipped our radar > >> (HDFS-14667). Since this is the first of a minor release, we would like > to > >> get it into 2.10.0. > >> > >> HDFS-14667 has been committed to branch-2.10.0, I will be rolling an RC1 > >> shortly. > >> > >> Jonathan Hung > >> > >> > >> On Tue, Oct 22, 2019 at 1:39 AM Zhankun Tang wrote: > >> > >> > Thanks for the effort, Jonathan! > >> > > >> > +1 (non-binding) on RC0. > >> > - Set up a single node cluster with the binary tarball > >> > - Run Spark Pi and pySpark job > >> > > >> > BR, > >> > Zhankun > >> > > >> > On Tue, 22 Oct 2019 at 14:31, Konstantin Shvachko < > shv.had...@gmail.com > >> > > >> > wrote: > >> > > >> >> +1 on RC0. > >> >> - Verified signatures > >> >> - Built from sources > >> >> - Ran unit tests for new features > >> >> - Checked artifacts on Nexus, made sure the sources are present. > >> >> > >> >> Thanks > >> >> --Konstantin > >> >> > >> >> > >> >> On Wed, Oct 16, 2019 at 6:01 PM Jonathan Hung > >> >> wrote: > >> >> > >> >> > Hi folks, > >> >> > > >> >> > This is the first release candidate for the first release of Apache > >> >> Hadoop > >> >> > 2.10 line. It contains 361 fixes/improvements since 2.9 [1]. It > >> includes > >> >> > features such as: > >> >> > > >> >> > - User-defined resource types > >> >> > - Native GPU support as a schedulable resource type > >> >> > - Consistent reads from standby node > >> >> > - Namenode port based selective encryption > >> >> > - Improvements related to rolling upgrade support from 2.x to 3.x > >> >> > > >> >> > The RC0 artifacts are at: > >> >> http://home.apache.org/~jhung/hadoop-2.10.0-RC0/ > >> >> > > >> >> > RC tag is release-2.10.0-RC0. > >> >> > > >> >> > The maven artifacts are hosted here: > >> >> > > >> >> > >> > https://repository.apache.org/content/repositories/orgapachehadoop-1241/ > >> >> > > >> >> > My public key is available here: > >> >> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS > >> >> > > >> >> > The vote will run for 5 weekdays, until Wednesday, October 23 at > >> 6:00 pm > >> >> > PDT. > >> >> > > >> >> > Thanks, > >> >> > Jonathan Hung > >> >> > > >> >> > [1] > >> >> > > >> >> > > >> >> > >> > https://issues.apache.org/jira/issues/?jql=project%20in%20(HDFS%2C%20YARN%2C%20HADOOP%2C%20MAPREDUCE)%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%202.10.0%20AND%20fixVersion%20not%20in%20(2.9.2%2C%202.9.1%2C%202.9.0) > >> >> > > >> >> > >> > > >> > > >
Re: [VOTE] Release Apache Hadoop 2.10.0 (RC1)
Some more thoughts: for the javadoc issue, I think we can just support building on java 7. For the release notes issue, I can work with the authors of the major features to come up with release notes and update them before pushing it to site. The release notes in the published artifacts won't be up to date, but I think that's fine. I'll go ahead with this plan if no objections. Jonathan Hung On Fri, Oct 25, 2019 at 12:19 PM Jonathan Hung wrote: > Thanks for looking Erik. > > For the release notes, yeah I think it's because there's no release notes > for the corresponding JIRAs. I've added details for these features to the > index.md.vm file which should show up on the homepage for 2.10.0 (e.g. > https://hadoop.apache.org/docs/r2.9.0/index.html). We could add release > notes for these JIRAs, but that would require recreating the tar.gzs since > the release notes are bundled in there. > > For the javadoc issue, I was able to repro this issue, seems it's because > the org.apache.hadoop.yarn.client.ClientRMProxy import was removed in > FederationProxyProviderUtil in YARN-7900 in branch-2 (but not in other > branches). But it's referenced in javadocs in this file so it throws this > error. Re-adding this import and building with java 8 allows it to succeed. > > I checked javadoc html for FederationProxyProviderUtil in the produced > artifacts and it appears to be correct. > > I think we could easily overwrite the current RC1 artifacts with ones > containing proper release notes. Not sure what to do about the javadoc > issue though, that would require overwriting the release-2.10.0-RC1 tag > which I don't want to do. What do others think? > > Jonathan Hung > > > On Fri, Oct 25, 2019 at 9:21 AM Erik Krogen wrote: > >> Thanks for putting this together, Jonathan! >> >> I noticed that the RELEASENOTES.md makes no mention of any of the major >> features you mentioned in your email about the RC. Is this expected? I >> guess it is caused by the lack of a release note on the JIRAs for those >> features. >> >> I also noticed that building a distribution package (mvn -DskipTests >> package -Pdist) fails on Java 8 (1.8.0_172) with a bunch of Javadoc errors. >> It works fine on Java 7. Is this expected? >> >> Other verifications I performed: >> >>- Verified all signatures in RC1 >>- Verified all checksums in RC1 >>- Visually inspected contents of src tarball >>- Built from source on Mac OSX 10.14.6 and RHEL7 (Java 8) >> - mvn -DskipTests package >>- Visually inspected contents of binary tarball >> >> Thanks, >> Erik >> >> -- >> *From:* Konstantin Shvachko >> *Sent:* Wednesday, October 23, 2019 6:10 PM >> *To:* Jonathan Hung >> *Cc:* Hdfs-dev ; mapreduce-dev < >> mapreduce-...@hadoop.apache.org>; yarn-dev ; >> Hadoop Common >> *Subject:* Re: [VOTE] Release Apache Hadoop 2.10.0 (RC1) >> >> +1 on RC1 >> >> - Verified signatures >> - Verified maven artifacts on Nexus for sources >> - Checked rat reports >> - Checked documentation >> - Checked packaging contents >> - Built from sources on RHEL 7 box >> - Ran unit tests for new HDFS features with Java 8 >> >> Thanks, >> --Konstantin >> >> On Tue, Oct 22, 2019 at 2:55 PM Jonathan Hung >> wrote: >> >> > Hi folks, >> > >> > This is the second release candidate for the first release of Apache >> Hadoop >> > 2.10 line. It contains 362 fixes/improvements since 2.9 [1]. It includes >> > features such as: >> > >> > - User-defined resource types >> > - Native GPU support as a schedulable resource type >> > - Consistent reads from standby node >> > - Namenode port based selective encryption >> > - Improvements related to rolling upgrade support from 2.x to 3.x >> > - Cost based fair call queue >> > >> > The RC1 artifacts are at: >> https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fhome.apache.org%2F~jhung%2Fhadoop-2.10.0-RC1%2F&data=02%7C01%7Cekrogen%40linkedin.com%7C1fee1e5911d8415a418b08d7581f0c7e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C637074762694349124&sdata=ZX7lF4N3fV38ggkplLU56ybhKBZrx%2FUKMkfxm2WJ7eU%3D&reserved=0 >> > >> > RC tag is release-2.10.0-RC1. >> > >> > The maven artifacts are hosted here: >> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachehadoop-1243%2F&data=02%7C01%7Cekrogen%40linkedin.com%7C
Re: [VOTE] Release Apache Hadoop 2.10.0 (RC1)
Thanks for looking Erik. For the release notes, yeah I think it's because there's no release notes for the corresponding JIRAs. I've added details for these features to the index.md.vm file which should show up on the homepage for 2.10.0 (e.g. https://hadoop.apache.org/docs/r2.9.0/index.html). We could add release notes for these JIRAs, but that would require recreating the tar.gzs since the release notes are bundled in there. For the javadoc issue, I was able to repro this issue, seems it's because the org.apache.hadoop.yarn.client.ClientRMProxy import was removed in FederationProxyProviderUtil in YARN-7900 in branch-2 (but not in other branches). But it's referenced in javadocs in this file so it throws this error. Re-adding this import and building with java 8 allows it to succeed. I checked javadoc html for FederationProxyProviderUtil in the produced artifacts and it appears to be correct. I think we could easily overwrite the current RC1 artifacts with ones containing proper release notes. Not sure what to do about the javadoc issue though, that would require overwriting the release-2.10.0-RC1 tag which I don't want to do. What do others think? Jonathan Hung On Fri, Oct 25, 2019 at 9:21 AM Erik Krogen wrote: > Thanks for putting this together, Jonathan! > > I noticed that the RELEASENOTES.md makes no mention of any of the major > features you mentioned in your email about the RC. Is this expected? I > guess it is caused by the lack of a release note on the JIRAs for those > features. > > I also noticed that building a distribution package (mvn -DskipTests > package -Pdist) fails on Java 8 (1.8.0_172) with a bunch of Javadoc errors. > It works fine on Java 7. Is this expected? > > Other verifications I performed: > >- Verified all signatures in RC1 >- Verified all checksums in RC1 >- Visually inspected contents of src tarball >- Built from source on Mac OSX 10.14.6 and RHEL7 (Java 8) >- mvn -DskipTests package >- Visually inspected contents of binary tarball > > Thanks, > Erik > > ------ > *From:* Konstantin Shvachko > *Sent:* Wednesday, October 23, 2019 6:10 PM > *To:* Jonathan Hung > *Cc:* Hdfs-dev ; mapreduce-dev < > mapreduce-...@hadoop.apache.org>; yarn-dev ; > Hadoop Common > *Subject:* Re: [VOTE] Release Apache Hadoop 2.10.0 (RC1) > > +1 on RC1 > > - Verified signatures > - Verified maven artifacts on Nexus for sources > - Checked rat reports > - Checked documentation > - Checked packaging contents > - Built from sources on RHEL 7 box > - Ran unit tests for new HDFS features with Java 8 > > Thanks, > --Konstantin > > On Tue, Oct 22, 2019 at 2:55 PM Jonathan Hung > wrote: > > > Hi folks, > > > > This is the second release candidate for the first release of Apache > Hadoop > > 2.10 line. It contains 362 fixes/improvements since 2.9 [1]. It includes > > features such as: > > > > - User-defined resource types > > - Native GPU support as a schedulable resource type > > - Consistent reads from standby node > > - Namenode port based selective encryption > > - Improvements related to rolling upgrade support from 2.x to 3.x > > - Cost based fair call queue > > > > The RC1 artifacts are at: > https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fhome.apache.org%2F~jhung%2Fhadoop-2.10.0-RC1%2F&data=02%7C01%7Cekrogen%40linkedin.com%7C1fee1e5911d8415a418b08d7581f0c7e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C637074762694349124&sdata=ZX7lF4N3fV38ggkplLU56ybhKBZrx%2FUKMkfxm2WJ7eU%3D&reserved=0 > > > > RC tag is release-2.10.0-RC1. > > > > The maven artifacts are hosted here: > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachehadoop-1243%2F&data=02%7C01%7Cekrogen%40linkedin.com%7C1fee1e5911d8415a418b08d7581f0c7e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C637074762694349124&sdata=DsJDfoj8eg3E%2F%2BNEwOAI41LhcRJ2hOWycS923ds3Seg%3D&reserved=0 > > > > My public key is available here: > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Frelease%2Fhadoop%2Fcommon%2FKEYS&data=02%7C01%7Cekrogen%40linkedin.com%7C1fee1e5911d8415a418b08d7581f0c7e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C637074762694349124&sdata=1694z6xhj5NtxwYBpwnRBx%2BgK0npGIUs5O580K3KPJw%3D&reserved=0 > > > > The vote will run for 5 weekdays, until Tuesday, October 29 at 3:00 pm > PDT. > > > > Thanks, > > Jonathan Hung > > > > [1] > > > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.a
Re: [VOTE] Release Apache Hadoop 2.10.0 (RC1)
Hi Eric, thanks for trying it out. We talked about this in today's YARN community sync up, summarizing here for everyone else: I don't think it's worth delaying the 2.10.0 release further, we can address this in a subsequent 2.10.x release. Wangda mentioned it might be related to changes in dominant resource calculator, but root cause remains to be seen. Jonathan Hung On Wed, Oct 23, 2019 at 9:02 AM epa...@apache.org wrote: > Hi Jonathan, > > Thanks very much for all of your work on this release. > > I have a concern about cross-queue (inter-queue) preemption in 2.10. > > In 2.8, on a 6 node pseudo-cluster, preempting from one queue to meet the > needs of another queue seems to work as expected. However, 2.10 in the same > pseudo-cluster (with the same config properties), only one container was > preempted for the AM and then nothing else. > > I don't know how the community feels about holding up the 2.10.0 release > for this issue, but we need to get to the bottom of this before we can go > to 2.10.x. I am still investigating. > > Thanks, > -Eric > > > > > On Tuesday, October 22, 2019, 4:55:29 PM CDT, Jonathan Hung < > jyhung2...@gmail.com> wrote: > > Hi folks, > > > > This is the second release candidate for the first release of Apache > Hadoop > > 2.10 line. It contains 362 fixes/improvements since 2.9 [1]. It includes > > features such as: > > > > - User-defined resource types > > - Native GPU support as a schedulable resource type > > - Consistent reads from standby node > > - Namenode port based selective encryption > > - Improvements related to rolling upgrade support from 2.x to 3.x > > - Cost based fair call queue > > > > The RC1 artifacts are at: > http://home.apache.org/~jhung/hadoop-2.10.0-RC1/ > > > > RC tag is release-2.10.0-RC1. > > > > The maven artifacts are hosted here: > > https://repository.apache.org/content/repositories/orgapachehadoop-1243/ > > > > My public key is available here: > > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS > > > > The vote will run for 5 weekdays, until Tuesday, October 29 at 3:00 pm > PDT. > > > > Thanks, > > Jonathan Hung >
Re: [VOTE] Release Apache Hadoop 2.10.0 (RC0)
Hi Eric, we've run some basic HDFS commands with a 3.2.1 namenode and 2.10.0 clients and datanodes. Everything worked as expected. Jonathan Hung On Tue, Oct 22, 2019 at 3:04 PM Eric Badger wrote: > Hi Jonathan, > > Thanks for putting this RC together. You stated that there are > improvements related to rolling upgrades from 2.x to 3.x and I know I have > seen multiple JIRAs getting committed to that effect. Could you describe > any tests that you have done to verify rolling upgrade compatibility > for 3.x servers talking to 2.x clients and vice versa? > > Thanks, > > Eric > > On Tue, Oct 22, 2019 at 1:49 PM Jonathan Hung > wrote: > >> Thanks Konstantin and Zhankun. Unfortunately a feature slipped our radar >> (HDFS-14667). Since this is the first of a minor release, we would like to >> get it into 2.10.0. >> >> HDFS-14667 has been committed to branch-2.10.0, I will be rolling an RC1 >> shortly. >> >> Jonathan Hung >> >> >> On Tue, Oct 22, 2019 at 1:39 AM Zhankun Tang wrote: >> >> > Thanks for the effort, Jonathan! >> > >> > +1 (non-binding) on RC0. >> > - Set up a single node cluster with the binary tarball >> > - Run Spark Pi and pySpark job >> > >> > BR, >> > Zhankun >> > >> > On Tue, 22 Oct 2019 at 14:31, Konstantin Shvachko > > >> > wrote: >> > >> >> +1 on RC0. >> >> - Verified signatures >> >> - Built from sources >> >> - Ran unit tests for new features >> >> - Checked artifacts on Nexus, made sure the sources are present. >> >> >> >> Thanks >> >> --Konstantin >> >> >> >> >> >> On Wed, Oct 16, 2019 at 6:01 PM Jonathan Hung >> >> wrote: >> >> >> >> > Hi folks, >> >> > >> >> > This is the first release candidate for the first release of Apache >> >> Hadoop >> >> > 2.10 line. It contains 361 fixes/improvements since 2.9 [1]. It >> includes >> >> > features such as: >> >> > >> >> > - User-defined resource types >> >> > - Native GPU support as a schedulable resource type >> >> > - Consistent reads from standby node >> >> > - Namenode port based selective encryption >> >> > - Improvements related to rolling upgrade support from 2.x to 3.x >> >> > >> >> > The RC0 artifacts are at: >> >> http://home.apache.org/~jhung/hadoop-2.10.0-RC0/ >> >> > >> >> > RC tag is release-2.10.0-RC0. >> >> > >> >> > The maven artifacts are hosted here: >> >> > >> >> >> https://repository.apache.org/content/repositories/orgapachehadoop-1241/ >> >> > >> >> > My public key is available here: >> >> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS >> >> > >> >> > The vote will run for 5 weekdays, until Wednesday, October 23 at >> 6:00 pm >> >> > PDT. >> >> > >> >> > Thanks, >> >> > Jonathan Hung >> >> > >> >> > [1] >> >> > >> >> > >> >> >> https://issues.apache.org/jira/issues/?jql=project%20in%20(HDFS%2C%20YARN%2C%20HADOOP%2C%20MAPREDUCE)%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%202.10.0%20AND%20fixVersion%20not%20in%20(2.9.2%2C%202.9.1%2C%202.9.0) >> >> > >> >> >> > >> >
[VOTE] Release Apache Hadoop 2.10.0 (RC1)
Hi folks, This is the second release candidate for the first release of Apache Hadoop 2.10 line. It contains 362 fixes/improvements since 2.9 [1]. It includes features such as: - User-defined resource types - Native GPU support as a schedulable resource type - Consistent reads from standby node - Namenode port based selective encryption - Improvements related to rolling upgrade support from 2.x to 3.x - Cost based fair call queue The RC1 artifacts are at: http://home.apache.org/~jhung/hadoop-2.10.0-RC1/ RC tag is release-2.10.0-RC1. The maven artifacts are hosted here: https://repository.apache.org/content/repositories/orgapachehadoop-1243/ My public key is available here: https://dist.apache.org/repos/dist/release/hadoop/common/KEYS The vote will run for 5 weekdays, until Tuesday, October 29 at 3:00 pm PDT. Thanks, Jonathan Hung [1] https://issues.apache.org/jira/issues/?jql=project%20in%20(HDFS%2C%20YARN%2C%20HADOOP%2C%20MAPREDUCE)%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%202.10.0%20AND%20fixVersion%20not%20in%20(2.9.2%2C%202.9.1%2C%202.9.0)
Re: [VOTE] Release Apache Hadoop 2.10.0 (RC0)
Thanks Konstantin and Zhankun. Unfortunately a feature slipped our radar (HDFS-14667). Since this is the first of a minor release, we would like to get it into 2.10.0. HDFS-14667 has been committed to branch-2.10.0, I will be rolling an RC1 shortly. Jonathan Hung On Tue, Oct 22, 2019 at 1:39 AM Zhankun Tang wrote: > Thanks for the effort, Jonathan! > > +1 (non-binding) on RC0. > - Set up a single node cluster with the binary tarball > - Run Spark Pi and pySpark job > > BR, > Zhankun > > On Tue, 22 Oct 2019 at 14:31, Konstantin Shvachko > wrote: > >> +1 on RC0. >> - Verified signatures >> - Built from sources >> - Ran unit tests for new features >> - Checked artifacts on Nexus, made sure the sources are present. >> >> Thanks >> --Konstantin >> >> >> On Wed, Oct 16, 2019 at 6:01 PM Jonathan Hung >> wrote: >> >> > Hi folks, >> > >> > This is the first release candidate for the first release of Apache >> Hadoop >> > 2.10 line. It contains 361 fixes/improvements since 2.9 [1]. It includes >> > features such as: >> > >> > - User-defined resource types >> > - Native GPU support as a schedulable resource type >> > - Consistent reads from standby node >> > - Namenode port based selective encryption >> > - Improvements related to rolling upgrade support from 2.x to 3.x >> > >> > The RC0 artifacts are at: >> http://home.apache.org/~jhung/hadoop-2.10.0-RC0/ >> > >> > RC tag is release-2.10.0-RC0. >> > >> > The maven artifacts are hosted here: >> > >> https://repository.apache.org/content/repositories/orgapachehadoop-1241/ >> > >> > My public key is available here: >> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS >> > >> > The vote will run for 5 weekdays, until Wednesday, October 23 at 6:00 pm >> > PDT. >> > >> > Thanks, >> > Jonathan Hung >> > >> > [1] >> > >> > >> https://issues.apache.org/jira/issues/?jql=project%20in%20(HDFS%2C%20YARN%2C%20HADOOP%2C%20MAPREDUCE)%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%202.10.0%20AND%20fixVersion%20not%20in%20(2.9.2%2C%202.9.1%2C%202.9.0) >> > >> >
[VOTE] Release Apache Hadoop 2.10.0 (RC0)
Hi folks, This is the first release candidate for the first release of Apache Hadoop 2.10 line. It contains 361 fixes/improvements since 2.9 [1]. It includes features such as: - User-defined resource types - Native GPU support as a schedulable resource type - Consistent reads from standby node - Namenode port based selective encryption - Improvements related to rolling upgrade support from 2.x to 3.x The RC0 artifacts are at: http://home.apache.org/~jhung/hadoop-2.10.0-RC0/ RC tag is release-2.10.0-RC0. The maven artifacts are hosted here: https://repository.apache.org/content/repositories/orgapachehadoop-1241/ My public key is available here: https://dist.apache.org/repos/dist/release/hadoop/common/KEYS The vote will run for 5 weekdays, until Wednesday, October 23 at 6:00 pm PDT. Thanks, Jonathan Hung [1] https://issues.apache.org/jira/issues/?jql=project%20in%20(HDFS%2C%20YARN%2C%20HADOOP%2C%20MAPREDUCE)%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%202.10.0%20AND%20fixVersion%20not%20in%20(2.9.2%2C%202.9.1%2C%202.9.0)
Re: [DISCUSS] Hadoop 2.10.0 release plan
I've moved all jiras with target version 2.10.0 to 2.10.1. Also I've created branch-2.10 and branch-2.10.0, please commit any 2.10.x bug fixes to branch-2.10. I'll send out a vote thread for 2.10.0-RC0 shortly. Jonathan Hung On Fri, Oct 11, 2019 at 10:32 AM Jonathan Hung wrote: > Edit: seems a 2.10.0 blocker was reopened (HDFS-14305). I'll continue > watching this jira and start the release once this is resolved. > > Jonathan Hung > > > On Thu, Oct 10, 2019 at 5:13 PM Jonathan Hung > wrote: > >> Hi folks, as of now all 2.10.0 blockers have been resolved [1]. So I'll >> start the release process soon (cutting branches, updating target versions, >> etc). >> >> [1] https://issues.apache.org/jira/issues/?filter=12346975 >> >> Jonathan Hung >> >> >> On Mon, Aug 26, 2019 at 10:19 AM Jonathan Hung >> wrote: >> >>> Hi folks, >>> >>> As discussed previously (e.g. [1], [2]) we'd like to do a 2.10.0 release >>> soon. Some features/big-items we're targeting for this release: >>> >>>- YARN resource types/GPU support (YARN-8200 >>><https://issues.apache.org/jira/browse/YARN-8200>) >>>- Selective wire encryption (HDFS-13541 >>><https://issues.apache.org/jira/browse/HDFS-13541>) >>>- Rolling upgrade support from 2.x to 3.x (e.g. HDFS-14509 >>><https://issues.apache.org/jira/browse/HDFS-14509>) >>> >>> Per [3] sounds like there's concern around upgrading dependencies as >>> well. >>> >>> We created a public jira filter here ( >>> https://issues.apache.org/jira/issues/?filter=12346975) marking all >>> blockers for 2.10.0 release. If you have other jiras that should be 2.10.0 >>> blockers, please mark "Target Version/s" as "2.10.0" and add label >>> "release-blocker" so we can track it through this filter. >>> >>> We're targeting a release at end of September. >>> >>> Please share any thoughts you have about this. Thanks! >>> >>> [1] >>> https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg29461.html >>> [2] >>> https://www.mail-archive.com/mapreduce-dev@hadoop.apache.org/msg21293.html >>> [3] >>> https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg33440.html >>> >>> >>> Jonathan Hung >>> >>
Re: [DISCUSS] Hadoop 2.10.0 release plan
Edit: seems a 2.10.0 blocker was reopened (HDFS-14305). I'll continue watching this jira and start the release once this is resolved. Jonathan Hung On Thu, Oct 10, 2019 at 5:13 PM Jonathan Hung wrote: > Hi folks, as of now all 2.10.0 blockers have been resolved [1]. So I'll > start the release process soon (cutting branches, updating target versions, > etc). > > [1] https://issues.apache.org/jira/issues/?filter=12346975 > > Jonathan Hung > > > On Mon, Aug 26, 2019 at 10:19 AM Jonathan Hung > wrote: > >> Hi folks, >> >> As discussed previously (e.g. [1], [2]) we'd like to do a 2.10.0 release >> soon. Some features/big-items we're targeting for this release: >> >>- YARN resource types/GPU support (YARN-8200 >><https://issues.apache.org/jira/browse/YARN-8200>) >>- Selective wire encryption (HDFS-13541 >><https://issues.apache.org/jira/browse/HDFS-13541>) >>- Rolling upgrade support from 2.x to 3.x (e.g. HDFS-14509 >><https://issues.apache.org/jira/browse/HDFS-14509>) >> >> Per [3] sounds like there's concern around upgrading dependencies as well. >> >> We created a public jira filter here ( >> https://issues.apache.org/jira/issues/?filter=12346975) marking all >> blockers for 2.10.0 release. If you have other jiras that should be 2.10.0 >> blockers, please mark "Target Version/s" as "2.10.0" and add label >> "release-blocker" so we can track it through this filter. >> >> We're targeting a release at end of September. >> >> Please share any thoughts you have about this. Thanks! >> >> [1] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg29461.html >> [2] >> https://www.mail-archive.com/mapreduce-dev@hadoop.apache.org/msg21293.html >> [3] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg33440.html >> >> >> Jonathan Hung >> >
Re: [DISCUSS] Hadoop 2.10.0 release plan
Hi folks, as of now all 2.10.0 blockers have been resolved [1]. So I'll start the release process soon (cutting branches, updating target versions, etc). [1] https://issues.apache.org/jira/issues/?filter=12346975 Jonathan Hung On Mon, Aug 26, 2019 at 10:19 AM Jonathan Hung wrote: > Hi folks, > > As discussed previously (e.g. [1], [2]) we'd like to do a 2.10.0 release > soon. Some features/big-items we're targeting for this release: > >- YARN resource types/GPU support (YARN-8200 ><https://issues.apache.org/jira/browse/YARN-8200>) >- Selective wire encryption (HDFS-13541 ><https://issues.apache.org/jira/browse/HDFS-13541>) >- Rolling upgrade support from 2.x to 3.x (e.g. HDFS-14509 ><https://issues.apache.org/jira/browse/HDFS-14509>) > > Per [3] sounds like there's concern around upgrading dependencies as well. > > We created a public jira filter here ( > https://issues.apache.org/jira/issues/?filter=12346975) marking all > blockers for 2.10.0 release. If you have other jiras that should be 2.10.0 > blockers, please mark "Target Version/s" as "2.10.0" and add label > "release-blocker" so we can track it through this filter. > > We're targeting a release at end of September. > > Please share any thoughts you have about this. Thanks! > > [1] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg29461.html > [2] > https://www.mail-archive.com/mapreduce-dev@hadoop.apache.org/msg21293.html > [3] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg33440.html > > > Jonathan Hung >
[jira] [Created] (YARN-9869) Create scheduling policy to auto-adjust queue elasticity based on cluster demand
Jonathan Hung created YARN-9869: --- Summary: Create scheduling policy to auto-adjust queue elasticity based on cluster demand Key: YARN-9869 URL: https://issues.apache.org/jira/browse/YARN-9869 Project: Hadoop YARN Issue Type: New Feature Reporter: Jonathan Hung Currently LinkedIn has a policy to auto-adjust queue elasticity based on real-time queue demand. We've been running this policy in production for a long time and it has helped improve overall cluster utilization. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions
Jonathan Hung created YARN-9858: --- Summary: Optimize RMContext getExclusiveEnforcedPartitions Key: YARN-9858 URL: https://issues.apache.org/jira/browse/YARN-9858 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Hung Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a hot code path, need to optimize it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: Incompatible changes between branch-2.8 and branch-2.9
- I've created YARN-9855 and uploaded patches to fix YARN-6616 in branch-2.8 and branch-2.7. - For YARN-6050, not sure either. Robert/Wangda, can you comment on YARN-6050 compatibility? - For YARN-7813, not sure why moving from 2.8.4/5 -> 2.8.6 would be incompatible with this strategy? It should be OK to remove/add optional fields (removing the field with id 12, and adding the field with id 13). The difficulties I see here are, we would have to leave id 12 blank in 2.8.6 (so we cannot have YARN-6164 in branch-2.8), and users on 2.8.4/5 would have to move to 2.8.6 before moving to 2.9+. But rolling upgrade would still work IIUC. Jonathan Hung On Tue, Sep 24, 2019 at 2:52 PM Eric Badger wrote: > * For YARN-6616, for branch-2.8 and below, it was only committed to > 2.7.8/2.8.6 which have not been released (as I understand). Perhaps we can > revert YARN-6616 from branch-2.7 and branch-2.8. > - This seems reasonable. Since we haven't released anything, it should > be no issue to change the 2.7/2.8 protobuf field to have the same value as > 2.9+ > > * For YARN-6050, there's a bit here: > https://developers.google.com/protocol-buffers/docs/proto that says > "optional is compatible with repeated", so I think we should be OK there. > - Optional is compatible with repeatable over the wire such that > protobuf won't blow up, but does that actually mean that it's compatible in > this case? If it's expecting an optional and gets a repeated, it's going to > drop everything except for the last value. I don't know enough about > YARN-6050 to say if this will be ok or not. > > * For YARN-7813, it's in 2.8.4 so it seems upgrading from 2.8.4 or 2.8.5 > to a 2.9+ version will be an issue. One option could be to move the > intraQueuePreemptionDisabled field from id 12 to id 13 in branch-2.8, then > users would upgrade from 2.8.4/2.8.5 to 2.8.6 (someone would have to > release this), then upgrade from 2.8.6 to 2.9+. > - I'm ok with this, but it should be noted that the upgrade from > 2.8.4/2.8.5 to 2.8.6 (or 2.9+) would not be compatible for a rolling > upgrade. So this would cause some pain to anybody with clusters on those > versions. > > Eric > > On Tue, Sep 24, 2019 at 2:42 PM Jonathan Hung > wrote: > >> Sorry, let me edit my first point. We can just create addendums for >> YARN-6616 in branch-2.7 and branch-2.8 to edit the submitTime field to the >> correct id 28. We don’t need to revert YARN-6616 from these branches >> completely. >> >> Jonathan >> >> >> From: Jonathan Hung >> Sent: Tuesday, September 24, 2019 11:38 AM >> To: Eric Badger >> Cc: Hadoop Common; yarn-dev; mapreduce-dev; Hdfs-dev >> Subject: Re: Incompatible changes between branch-2.8 and branch-2.9 >> >> Hi Eric, thanks for the investigation. >> >> * For YARN-6616, for branch-2.8 and below, it was only committed to >> 2.7.8/2.8.6 which have not been released (as I understand). Perhaps we can >> revert YARN-6616 from branch-2.7 and branch-2.8. >> * For YARN-6050, there's a bit here: >> https://developers.google.com/protocol-buffers/docs/proto that says >> "optional is compatible with repeated", so I think we should be OK there. >> * For YARN-7813, it's in 2.8.4 so it seems upgrading from 2.8.4 or >> 2.8.5 to a 2.9+ version will be an issue. One option could be to move the >> intraQueuePreemptionDisabled field from id 12 to id 13 in branch-2.8, then >> users would upgrade from 2.8.4/2.8.5 to 2.8.6 (someone would have to >> release this), then upgrade from 2.8.6 to 2.9+. >> >> Jonathan Hung >> >> >> On Tue, Sep 24, 2019 at 9:23 AM Eric Badger >> >> wrote: >> We (Verizon Media) are currently moving towards upgrading our clusters >> from >> our internal fork of branch-2.8 to an internal fork of branch-2. During >> this process, we have found multiple incompatible changes in protobufs >> between branch-2.8 and branch-2. These incompatibilities were all >> introduced between branch-2.8 and branch-2.9. I did a git diff over all >> .proto files across the branch-2.8 and branch-2.9 and found 3 instances of >> incompatibilities from 3 separate commits. All of the incompatibilities >> are >> in yarn_protos.proto >> >> >> I would like to discuss how to fix these incompatible changes. Otherwise, >> rolling upgrades will not be supported between branch-2.8 (or below) and >> branch-2.9 (or beyond). We could revert the incompatible changes, but then >> the new releases would be incompatible with the releases that have these &
[jira] [Created] (YARN-9855) Fix ApplicationReportProto submitTime id in branch-2.8/branch-2.7
Jonathan Hung created YARN-9855: --- Summary: Fix ApplicationReportProto submitTime id in branch-2.8/branch-2.7 Key: YARN-9855 URL: https://issues.apache.org/jira/browse/YARN-9855 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Hung Assignee: Jonathan Hung As per [http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201909.mbox/%3cCAAaVJWUKTBXEYV_-yWs2PT8aqhjQXq=garav+yzjxq0nx36...@mail.gmail.com%3e]. Update this field to use the same id as in branch-2.9 and above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: Incompatible changes between branch-2.8 and branch-2.9
Sorry, let me edit my first point. We can just create addendums for YARN-6616 in branch-2.7 and branch-2.8 to edit the submitTime field to the correct id 28. We don’t need to revert YARN-6616 from these branches completely. Jonathan From: Jonathan Hung Sent: Tuesday, September 24, 2019 11:38 AM To: Eric Badger Cc: Hadoop Common; yarn-dev; mapreduce-dev; Hdfs-dev Subject: Re: Incompatible changes between branch-2.8 and branch-2.9 Hi Eric, thanks for the investigation. * For YARN-6616, for branch-2.8 and below, it was only committed to 2.7.8/2.8.6 which have not been released (as I understand). Perhaps we can revert YARN-6616 from branch-2.7 and branch-2.8. * For YARN-6050, there's a bit here: https://developers.google.com/protocol-buffers/docs/proto that says "optional is compatible with repeated", so I think we should be OK there. * For YARN-7813, it's in 2.8.4 so it seems upgrading from 2.8.4 or 2.8.5 to a 2.9+ version will be an issue. One option could be to move the intraQueuePreemptionDisabled field from id 12 to id 13 in branch-2.8, then users would upgrade from 2.8.4/2.8.5 to 2.8.6 (someone would have to release this), then upgrade from 2.8.6 to 2.9+. Jonathan Hung On Tue, Sep 24, 2019 at 9:23 AM Eric Badger wrote: We (Verizon Media) are currently moving towards upgrading our clusters from our internal fork of branch-2.8 to an internal fork of branch-2. During this process, we have found multiple incompatible changes in protobufs between branch-2.8 and branch-2. These incompatibilities were all introduced between branch-2.8 and branch-2.9. I did a git diff over all .proto files across the branch-2.8 and branch-2.9 and found 3 instances of incompatibilities from 3 separate commits. All of the incompatibilities are in yarn_protos.proto I would like to discuss how to fix these incompatible changes. Otherwise, rolling upgrades will not be supported between branch-2.8 (or below) and branch-2.9 (or beyond). We could revert the incompatible changes, but then the new releases would be incompatible with the releases that have these incompatible changes. If we do nothing, then rolling upgrades won't work between 2.8- and 2.9+. Thanks, Eric --- git diff branch-2.8..branch-2.9 $(find . -name '*\.proto') https://issues.apache.org/jira/browse/YARN-6616 - Trunk patch (applied through branch-2.9) differs from branch-2.8 patch @@ -211,7 +245,20 @@ message ApplicationReportProto { optional PriorityProto priority = 23; optional string appNodeLabelExpression = 24; optional string amNodeLabelExpression = 25; - optional int64 submitTime = 26; + repeated AppTimeoutsMapProto appTimeouts = 26; + optional int64 launchTime = 27; + optional int64 submitTime = 28; https://issues.apache.org/jira/browse/YARN-6050 - Trunk and branch-2 patches both change the protobuf type in the same way. @@ -356,7 +416,22 @@ message ApplicationSubmissionContextProto { optional LogAggregationContextProto log_aggregation_context = 14; optional ReservationIdProto reservation_id = 15; optional string node_label_expression = 16; - optional ResourceRequestProto am_container_resource_request = 17; + repeated ResourceRequestProto am_container_resource_request = 17; + repeated ApplicationTimeoutMapProto application_timeouts = 18; https://issues.apache.org/jira/browse/YARN-7813 - Trunk (applied through branch-3.1) and branch-3.0 (applied through branch-2.9) patches differ from branch-2.8 patch @@ -425,7 +501,21 @@ message QueueInfoProto { optional string defaultNodeLabelExpression = 9; optional QueueStatisticsProto queueStatistics = 10; optional bool preemptionDisabled = 11; - optional bool intraQueuePreemptionDisabled = 12; + repeated QueueConfigurationsMapProto queueConfigurationsMap = 12; + optional bool intraQueuePreemptionDisabled = 13;
Re: Incompatible changes between branch-2.8 and branch-2.9
Hi Eric, thanks for the investigation. - For YARN-6616, for branch-2.8 and below, it was only committed to 2.7.8/2.8.6 which have not been released (as I understand). Perhaps we can revert YARN-6616 from branch-2.7 and branch-2.8. - For YARN-6050, there's a bit here: https://developers.google.com/protocol-buffers/docs/proto that says "optional is compatible with repeated", so I think we should be OK there. - For YARN-7813, it's in 2.8.4 so it seems upgrading from 2.8.4 or 2.8.5 to a 2.9+ version will be an issue. One option could be to move the intraQueuePreemptionDisabled field from id 12 to id 13 in branch-2.8, then users would upgrade from 2.8.4/2.8.5 to 2.8.6 (someone would have to release this), then upgrade from 2.8.6 to 2.9+. Jonathan Hung On Tue, Sep 24, 2019 at 9:23 AM Eric Badger wrote: > We (Verizon Media) are currently moving towards upgrading our clusters from > our internal fork of branch-2.8 to an internal fork of branch-2. During > this process, we have found multiple incompatible changes in protobufs > between branch-2.8 and branch-2. These incompatibilities were all > introduced between branch-2.8 and branch-2.9. I did a git diff over all > .proto files across the branch-2.8 and branch-2.9 and found 3 instances of > incompatibilities from 3 separate commits. All of the incompatibilities are > in yarn_protos.proto > > > I would like to discuss how to fix these incompatible changes. Otherwise, > rolling upgrades will not be supported between branch-2.8 (or below) and > branch-2.9 (or beyond). We could revert the incompatible changes, but then > the new releases would be incompatible with the releases that have these > incompatible changes. If we do nothing, then rolling upgrades won't work > between 2.8- and 2.9+. > > > Thanks, > > > Eric > > > --- > > > git diff branch-2.8..branch-2.9 $(find . -name '*\.proto') > > > https://issues.apache.org/jira/browse/YARN-6616 > >- Trunk patch (applied through branch-2.9) differs from branch-2.8 patch > > @@ -211,7 +245,20 @@ message ApplicationReportProto { > >optional PriorityProto priority = 23; > >optional string appNodeLabelExpression = 24; > >optional string amNodeLabelExpression = 25; > > - optional int64 submitTime = 26; > > + repeated AppTimeoutsMapProto appTimeouts = 26; > > + optional int64 launchTime = 27; > > + optional int64 submitTime = 28; > > > https://issues.apache.org/jira/browse/YARN-6050 > >- Trunk and branch-2 patches both change the protobuf type in the same >way. > > @@ -356,7 +416,22 @@ message ApplicationSubmissionContextProto { > >optional LogAggregationContextProto log_aggregation_context = 14; > >optional ReservationIdProto reservation_id = 15; > >optional string node_label_expression = 16; > > - optional ResourceRequestProto am_container_resource_request = 17; > > + repeated ResourceRequestProto am_container_resource_request = 17; > > + repeated ApplicationTimeoutMapProto application_timeouts = 18; > > > https://issues.apache.org/jira/browse/YARN-7813 > >- Trunk (applied through branch-3.1) and branch-3.0 (applied through >branch-2.9) patches differ from branch-2.8 patch > > @@ -425,7 +501,21 @@ message QueueInfoProto { > >optional string defaultNodeLabelExpression = 9; > >optional QueueStatisticsProto queueStatistics = 10; > >optional bool preemptionDisabled = 11; > > - optional bool intraQueuePreemptionDisabled = 12; > > + repeated QueueConfigurationsMapProto queueConfigurationsMap = 12; > > + optional bool intraQueuePreemptionDisabled = 13; >
[jira] [Resolved] (YARN-6684) TestAMRMClient tests fail on branch-2.7
[ https://issues.apache.org/jira/browse/YARN-6684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung resolved YARN-6684. - Resolution: Won't Fix branch-2.7 EOL, closing as won't fix > TestAMRMClient tests fail on branch-2.7 > --- > > Key: YARN-6684 > URL: https://issues.apache.org/jira/browse/YARN-6684 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Jonathan Hung >Priority: Major > > {noformat}2017-06-01 19:10:44,362 INFO capacity.CapacityScheduler > (CapacityScheduler.java:addNode(1335)) - Added node > jhung-ld2.linkedin.biz:58205 clusterResource: > 2017-06-01 19:10:44,370 INFO server.MiniYARNCluster > (MiniYARNCluster.java:waitForNodeManagersToConnect(657)) - All Node Managers > connected in MiniYARNCluster > 2017-06-01 19:10:44,376 INFO client.RMProxy (RMProxy.java:createRMProxy(98)) > - Connecting to ResourceManager at jhung-ld2.linkedin.biz/ipaddr:36167 > 2017-06-01 19:10:45,501 INFO ipc.Client > (Client.java:handleConnectionFailure(872)) - Retrying connect to server: > jhung-ld2.linkedin.biz/ipaddr:36167. Already tried 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2017-06-01 19:10:46,502 INFO ipc.Client > (Client.java:handleConnectionFailure(872)) - Retrying connect to server: > jhung-ld2.linkedin.biz/ipaddr:36167. Already tried 1 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2017-06-01 19:10:47,503 INFO ipc.Client > (Client.java:handleConnectionFailure(872)) - Retrying connect to server: > jhung-ld2.linkedin.biz/ipaddr:36167. Already tried 2 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2017-06-01 19:10:48,504 INFO ipc.Client > (Client.java:handleConnectionFailure(872)) - Retrying connect to server: > jhung-ld2.linkedin.biz/ipaddr:36167. Already tried 3 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS){noformat} > After some investigation, seems it is the same issue as described here: > HDFS-11893 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8825) Print application tags in ApplicationSummary
[ https://issues.apache.org/jira/browse/YARN-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung resolved YARN-8825. - Resolution: Duplicate > Print application tags in ApplicationSummary > > > Key: YARN-8825 > URL: https://issues.apache.org/jira/browse/YARN-8825 > Project: Hadoop YARN > Issue Type: Improvement > Reporter: Jonathan Hung > Assignee: Jonathan Hung >Priority: Major > > Useful for tracking application tag metadata. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9844) TestCapacitySchedulerPerf test errors in branch-2
[ https://issues.apache.org/jira/browse/YARN-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung resolved YARN-9844. - Resolution: Fixed > TestCapacitySchedulerPerf test errors in branch-2 > - > > Key: YARN-9844 > URL: https://issues.apache.org/jira/browse/YARN-9844 > Project: Hadoop YARN > Issue Type: Bug > Components: test, yarn >Affects Versions: 2.10.0 >Reporter: Jim Brennan > Assignee: Jonathan Hung >Priority: Major > > These TestCapacitySchedulerPerf throughput tests are failing in branch-2: > {{[ERROR] > TestCapacitySchedulerPerf.testUserLimitThroughputForFiveResources:263->testUserLimitThroughputWithNumberOfResourceTypes:114 > » ArrayIndexOutOfBounds}} > {{[ERROR] > TestCapacitySchedulerPerf.testUserLimitThroughputForFourResources:258->testUserLimitThroughputWithNumberOfResourceTypes:114 > » ArrayIndexOutOfBounds}} > {{[ERROR] > TestCapacitySchedulerPerf.testUserLimitThroughputForThreeResources:253->testUserLimitThroughputWithNumberOfResourceTypes:114 > » ArrayIndexOutOfBounds}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9825) Changes for initializing placement rules with ResourceScheduler in branch-2
Jonathan Hung created YARN-9825: --- Summary: Changes for initializing placement rules with ResourceScheduler in branch-2 Key: YARN-9825 URL: https://issues.apache.org/jira/browse/YARN-9825 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Hung Assignee: Jonathan Hung YARN-8016 and YARN-8948 add functionality to initialize placement rules with ResourceScheduler. We need this in branch-2. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9824) Fall back to configured queue ordering policy class name
Jonathan Hung created YARN-9824: --- Summary: Fall back to configured queue ordering policy class name Key: YARN-9824 URL: https://issues.apache.org/jira/browse/YARN-9824 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Hung Currently this is how configured queue ordering policy is determined: {noformat} if (policyType.trim().equals(QUEUE_UTILIZATION_ORDERING_POLICY)) { // Doesn't respect priority qop = new PriorityUtilizationQueueOrderingPolicy(false); } else if (policyType.trim().equals( QUEUE_PRIORITY_UTILIZATION_ORDERING_POLICY)) { qop = new PriorityUtilizationQueueOrderingPolicy(true); } else { String message = "Unable to construct queue ordering policy=" + policyType + " queue=" + queue; throw new YarnRuntimeException(message); } {noformat} If we want to enable a policy which is not QUEUE_UTILIZATION_ORDERING_POLICY or QUEUE_PRIORITY_UTILIZATION_ORDERING_POLICY, it requires code change here to add a keyword for this policy. It'd be easier if the admin could configure a class name here instead of requiring a keyword. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9810) Add queue capacity/maxcapacity percentage metrics
Jonathan Hung created YARN-9810: --- Summary: Add queue capacity/maxcapacity percentage metrics Key: YARN-9810 URL: https://issues.apache.org/jira/browse/YARN-9810 Project: Hadoop YARN Issue Type: Improvement Environment: Similar to YARN-9085, it'd be good to have queue (absolute) capacity / (absolute) max capacity metrics in CSQueueMetrics. Reporter: Jonathan Hung -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9806) TestNMSimulator#testNMSimulator fails in branch-2
Jonathan Hung created YARN-9806: --- Summary: TestNMSimulator#testNMSimulator fails in branch-2 Key: YARN-9806 URL: https://issues.apache.org/jira/browse/YARN-9806 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Hung {noformat}java.lang.AssertionError: expected:<10240> but was:<0> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.sls.nodemanager.TestNMSimulator.testNMSimulator(TestNMSimulator.java:92) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runners.Suite.runChild(Suite.java:127) at org.junit.runners.Suite.runChild(Suite.java:26) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413){noformat} This appears fixed in YARN-7929. We only need the bit in TestNMSimulator though. This jira is to track getting this bit in branch-2. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-7585) NodeManager should go unhealthy when state store throws DBException
[ https://issues.apache.org/jira/browse/YARN-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung resolved YARN-7585. - Fix Version/s: 2.10.0 Resolution: Fixed Committed to branch-2. > NodeManager should go unhealthy when state store throws DBException > > > Key: YARN-7585 > URL: https://issues.apache.org/jira/browse/YARN-7585 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: release-blocker > Fix For: 2.10.0, 3.1.0 > > Attachments: YARN-7585.001.patch, YARN-7585.002.patch, > YARN-7585.003.patch > > > If work preserving recover is enabled the NM will not start up if the state > store does not initialise. However if the state store becomes unavailable > after that for any reason the NM will not go unhealthy. > Since the state store is not available new containers can not be started any > more and the NM should become unhealthy: > {code} > AMLauncher: Error launching appattempt_1508806289867_268617_01. Got > exception: org.apache.hadoop.yarn.exceptions.YarnException: > java.io.IOException: org.iq80.leveldb.DBException: IO error: > /dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log: > Read-only file system > at o.a.h.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38) > at > o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:721) > ... > Caused by: java.io.IOException: org.iq80.leveldb.DBException: IO error: > /dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log: > Read-only file system > at > o.a.h.y.s.n.r.NMLeveldbStateStoreService.storeApplication(NMLeveldbStateStoreService.java:374) > at > o.a.h.y.s.n.cm.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:848) > at > o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:712) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [VOTE] Merge YARN-8200 to branch-2 and branch-3.0
Thanks all, +1 from me too. There's three binding +1, two non-binding +1, and no -1 so I'll merge YARN-8200 to branch-2 shortly. I'll skip branch-3.0 since it's EOL as others have mentioned. Jonathan Hung On Tue, Aug 27, 2019 at 11:49 AM Konstantin Shvachko wrote: > +1 for the merge. > > We probably should not bother with branch-3.0 merge since it's been voted > EOL. > > Thanks, > --Konstantin > > On Thu, Aug 22, 2019 at 4:43 PM Jonathan Hung > wrote: > >> Hi folks, >> >> As per [1], starting a vote to merge YARN-8200 (and YARN-8200.branch3) >> feature branch to branch-2 (and branch-3.0). >> >> Vote runs for 7 days, to Thursday, Aug 29 5:00PM PDT. >> >> Thanks. >> >> [1] >> >> http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201908.mbox/%3cCAHzWLgcX7f5Tr3q=csrqgysvpdf7mh-iu17femgx89dhr+1...@mail.gmail.com%3e >> >> Jonathan Hung >> >
[DISCUSS] Hadoop 2.10.0 release plan
Hi folks, As discussed previously (e.g. [1], [2]) we'd like to do a 2.10.0 release soon. Some features/big-items we're targeting for this release: - YARN resource types/GPU support (YARN-8200 <https://issues.apache.org/jira/browse/YARN-8200>) - Selective wire encryption (HDFS-13541 <https://issues.apache.org/jira/browse/HDFS-13541>) - Rolling upgrade support from 2.x to 3.x (e.g. HDFS-14509 <https://issues.apache.org/jira/browse/HDFS-14509>) Per [3] sounds like there's concern around upgrading dependencies as well. We created a public jira filter here ( https://issues.apache.org/jira/issues/?filter=12346975) marking all blockers for 2.10.0 release. If you have other jiras that should be 2.10.0 blockers, please mark "Target Version/s" as "2.10.0" and add label "release-blocker" so we can track it through this filter. We're targeting a release at end of September. Please share any thoughts you have about this. Thanks! [1] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg29461.html [2] https://www.mail-archive.com/mapreduce-dev@hadoop.apache.org/msg21293.html [3] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg33440.html Jonathan Hung
[VOTE] Merge YARN-8200 to branch-2 and branch-3.0
Hi folks, As per [1], starting a vote to merge YARN-8200 (and YARN-8200.branch3) feature branch to branch-2 (and branch-3.0). Vote runs for 7 days, to Thursday, Aug 29 5:00PM PDT. Thanks. [1] http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201908.mbox/%3cCAHzWLgcX7f5Tr3q=csrqgysvpdf7mh-iu17femgx89dhr+1...@mail.gmail.com%3e Jonathan Hung
[jira] [Created] (YARN-9770) Create a queue ordering policy which picks child queues with equal probability
Jonathan Hung created YARN-9770: --- Summary: Create a queue ordering policy which picks child queues with equal probability Key: YARN-9770 URL: https://issues.apache.org/jira/browse/YARN-9770 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Hung Assignee: Jonathan Hung Ran some simulations with the default queue_utilization_ordering_policy: An underutilized queue which receives an application with many (thousands) resource requests will hog scheduler allocations for a long time (on the order of a minute). In the meantime apps are getting submitted to all other queues, which increases activeUsers in these queues, which drops user limit in these queues to small values if minimum-user-limit-percent is configured to small values (e.g. 10%). To avoid this issue, we assign to queues with equal probability, to avoid scenarios where queues don't get allocations for a long time. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [DISCUSS] Merging YARN-8200 to branch-3.0 and branch-2
Reviving this thread: we tested YARN RU starting with a cluster running 2.7.4, to running branch-2 + YARN-8200. Ran some simple MR/Spark jobs concurrently with the RM/NM upgrades and did not see any issues. If no other concerns I'll continue with a vote. Jonathan Hung On Thu, Apr 18, 2019 at 5:12 PM Jonathan Hung wrote: > Sorry for the delay, had to deprioritize this. Hoping to get to this next > week. > > Jonathan > > -- > *From:* Jim Brennan > *Sent:* Thursday, April 18, 2019 7:28 AM > *To:* Jonathan Hung > *Cc:* yarn-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org > *Subject:* Re: [DISCUSS] Merging YARN-8200 to branch-3.0 and branch-2 > > Hi Jonathan, > > Hi Jim, we have not tested rolling upgrade. I don’t foresee this being an >> issue, but we’ll try it out and report back. > > > Any update on this? > Jim > > > On Wed, Apr 3, 2019 at 2:16 AM Jonathan Hung wrote: > >> Hi Jim, we have not tested rolling upgrade. I don’t foresee this being an >> issue, but we’ll try it out and report back. >> >> Jonathan >> >> ------ >> *From:* Jim Brennan >> *Sent:* Tuesday, April 2, 2019 9:17 AM >> *To:* Jonathan Hung >> *Cc:* yarn-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org >> *Subject:* Re: [DISCUSS] Merging YARN-8200 to branch-3.0 and branch-2 >> >> Thanks for working on this! >> One concern for us is support for a rolling upgrade. If we are running a >> cluster based on branch-2.8, will we be able to do a rolling upgrade (no >> cluster down-time) to a branch containing these changes? Have you tested >> rolling upgrades? >> >> Thanks. >> Jim >> >> On Fri, Mar 29, 2019 at 2:14 PM Jonathan Hung >> wrote: >> >>> Hello devs, >>> >>> Starting a discuss thread to merge resource types/native GPU scheduling >>> support to branch-3.0 and branch-2. The resource types work was done in >>> trunk~branch-3.0 and GPU support done in trunk~branch-3.1, so the >>> proposal >>> is to merge GPU support into branch-3.0 and both resource types/GPU >>> support >>> to branch-2. >>> >>> Internally we've been running resource types/GPU support off a fork of >>> branch-2.9.0 in a > 300 node GPU cluster for a few months which has >>> worked >>> well. Also for completeness we verified that everything going into >>> branch-2 >>> also exists in branch-3.0. >>> >>> The specific list of patches to merge is in feature branch >>> YARN-8200.branch3 (for branch-3.0) and feature branch YARN-8200 (for >>> branch-2). Full patches containing the YARN-8200.branch3 -> branch-3.0 >>> diff >>> and YARN-8200 -> branch-2 diff have been posted to YARN-8200 jira. >>> >>> If there's no issues from the community I'll start a merge vote next >>> week. >>> Thanks. >>> >>> Jonathan Hung >>> >>
Re: [VOTE] Mark 2.6, 2.7, 3.0 release lines EOL
+1. Thanks! Jonathan Hung On Tue, Aug 20, 2019 at 8:03 PM Wangda Tan wrote: > Hi all, > > This is a vote thread to mark any versions smaller than 2.7 (inclusive), > and 3.0 EOL. This is based on discussions of [1] > > This discussion runs for 7 days and will conclude on Aug 28 Wed. > > Please feel free to share your thoughts. > > Thanks, > Wangda > > [1] > > http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201908.mbox/%3cCAD++eC=ou-tit1faob-dbecqe6ht7ede7t1dyra2p1yinpe...@mail.gmail.com%3e > , >
[jira] [Created] (YARN-9764) Print application submission context label in application summary
Jonathan Hung created YARN-9764: --- Summary: Print application submission context label in application summary Key: YARN-9764 URL: https://issues.apache.org/jira/browse/YARN-9764 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Hung -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9763) Print application tags in application summary
Jonathan Hung created YARN-9763: --- Summary: Print application tags in application summary Key: YARN-9763 URL: https://issues.apache.org/jira/browse/YARN-9763 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Hung -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9762) Add submission context label to audit logs
Jonathan Hung created YARN-9762: --- Summary: Add submission context label to audit logs Key: YARN-9762 URL: https://issues.apache.org/jira/browse/YARN-9762 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Hung Currently we log NODELABEL in container allocation/release audit logs, we should also log NODELABEL of application submission context on app submission. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9761) Allow overriding application submissions based on server side configs
Jonathan Hung created YARN-9761: --- Summary: Allow overriding application submissions based on server side configs Key: YARN-9761 URL: https://issues.apache.org/jira/browse/YARN-9761 Project: Hadoop YARN Issue Type: New Feature Reporter: Jonathan Hung Create a preprocessor/interceptor which takes each app submitted to RM and overrides the submission context based on server side configs. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9760) Support configuring application priorities on a workflow level
Jonathan Hung created YARN-9760: --- Summary: Support configuring application priorities on a workflow level Key: YARN-9760 URL: https://issues.apache.org/jira/browse/YARN-9760 Project: Hadoop YARN Issue Type: New Feature Reporter: Jonathan Hung Currently priorities are submitted on an application level, but for end users it's common to submit workloads to YARN at a workflow level. This jira proposes a feature to store workflow id + priority mappings on RM (similar to queue mappings). If app is submitted with a certain workflow id (as set in application submission context) RM will override this app's priority with the one defined in the mapping. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9751) Separate queue and app ordering policy capacity scheduler configs
Jonathan Hung created YARN-9751: --- Summary: Separate queue and app ordering policy capacity scheduler configs Key: YARN-9751 URL: https://issues.apache.org/jira/browse/YARN-9751 Project: Hadoop YARN Issue Type: Task Reporter: Jonathan Hung Right now it's not possible to specify distinct app and queue ordering policies since they share the same {{ordering-policy}} suffix. There's already a TODO in CapacitySchedulerConfiguration for this. This Jira intends to fix it. {noformat} // TODO (wangda): We need to better distinguish app ordering policy and queue // ordering policy's classname / configuration options, etc. And dedup code // if possible.{noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [DISCUSS] Hadoop 2019 Release Planning
Hi Wangda, Thanks for starting the discussion. We would also like to release 2.10.0 which was discussed previously <https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg29479.html> and at various contributor meetups. I'm interested in being release manager for that. Thanks, Jonathan Hung On Fri, Aug 9, 2019 at 7:59 PM Wangda Tan wrote: > Hi all, > > Hope this email finds you well > > I want to hear your thoughts about what should be the release plan for > 2019. > > In 2018, we released: > - 1 maintenance release of 2.6 > - 3 maintenance releases of 2.7 > - 3 maintenance releases of 2.8 > - 3 releases of 2.9 > - 4 releases of 3.0 > - 2 releases of 3.1 > > Total 16 releases in 2018. > > In 2019, by far we only have two releases: > - 1 maintenance release of 3.1 > - 1 minor release of 3.2. > > However, the community put a lot of efforts to stabilize features of > various release branches. > There're: > - 217 fixed patches in 3.1.3 [1] > - 388 fixed patches in 3.2.1 [2] > - 1172 fixed patches in 3.3.0 [3] (OMG!) > > I think it is the time to do maintenance releases of 3.1/3.2 and do a minor > release for 3.3.0. > > In addition, I saw community discussion to do a 2.8.6 release for security > fixes. > > Any other releases? I think there're release plans for Ozone as well. And > please add your thoughts. > > Volunteers welcome! If you have interests to run a release as Release > Manager (or co-Resource Manager), please respond to this email thread so we > can coordinate. > > Thanks, > Wangda Tan > > [1] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND resolution = Fixed AND > fixVersion = 3.1.3 > [2] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND resolution = Fixed AND > fixVersion = 3.2.1 > [3] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND resolution = Fixed AND > fixVersion = 3.3.0 >
[jira] [Created] (YARN-9736) Recursively configure app ordering policies
Jonathan Hung created YARN-9736: --- Summary: Recursively configure app ordering policies Key: YARN-9736 URL: https://issues.apache.org/jira/browse/YARN-9736 Project: Hadoop YARN Issue Type: Task Reporter: Jonathan Hung Currently app ordering policy will find confs with prefix {{.ordering-policy}}. For queues with same ordering policy configurations it's easier to have a queue inherit confs from its parent. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9730) Support forcing configured partitions to be exclusive based on app node label
Jonathan Hung created YARN-9730: --- Summary: Support forcing configured partitions to be exclusive based on app node label Key: YARN-9730 URL: https://issues.apache.org/jira/browse/YARN-9730 Project: Hadoop YARN Issue Type: Task Reporter: Jonathan Hung Assignee: Jonathan Hung Use case: queue X has all of its workload in non-default (exclusive) partition P (by setting app submission context's node label set to P). Node in partition Q != P heartbeats to RM. Capacity scheduler loops through every application in X, and every scheduler key in this application, and fails to allocate each time since the app's requested label and the node's label don't match. This causes huge performance degradation when number of apps in X is large. To fix the issue, allow RM to configure partitions as "forced-exclusive". If partition P is "forced-exclusive", then: * If app sets its submission context's node label to P, all its resource requests will be overridden to P * If app sets its submission context's node label Q, any of its resource requests whose labels are P will be overridden to Q * In the scheduler, we add apps with node label expression P to a separate data structure. When a node in partition P heartbeats to scheduler, we only try to schedule apps in this data structure. When a node in partition Q heartbeats to scheduler, we schedule the rest of the apps as normal. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9668) UGI conf doesn't read user overridden configurations on RM and NM startup
Jonathan Hung created YARN-9668: --- Summary: UGI conf doesn't read user overridden configurations on RM and NM startup Key: YARN-9668 URL: https://issues.apache.org/jira/browse/YARN-9668 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Hung Assignee: Jonathan Hung Similar to HADOOP-15150. Configs overridden thru e.g. -D or -conf are not passed to the UGI conf on RM or NM startup. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9615) Add dispatcher metrics to RM
Jonathan Hung created YARN-9615: --- Summary: Add dispatcher metrics to RM Key: YARN-9615 URL: https://issues.apache.org/jira/browse/YARN-9615 Project: Hadoop YARN Issue Type: Task Reporter: Jonathan Hung Assignee: Jonathan Hung It'd be good to have counts/processing times for each event type in RM async dispatcher and scheduler async dispatcher. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic
Jonathan Hung created YARN-9559: --- Summary: Create AbstractContainersLauncher for pluggable ContainersLauncher logic Key: YARN-9559 URL: https://issues.apache.org/jira/browse/YARN-9559 Project: Hadoop YARN Issue Type: Task Reporter: Jonathan Hung Assignee: Jonathan Hung -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9529) Log correct cpu controller path on error
Jonathan Hung created YARN-9529: --- Summary: Log correct cpu controller path on error Key: YARN-9529 URL: https://issues.apache.org/jira/browse/YARN-9529 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Hung Assignee: Jonathan Hung Attachments: YARN-9529.001.patch The base cpu controller path is logged instead of the hadoop cgroup path. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [DISCUSS] Merging YARN-8200 to branch-3.0 and branch-2
Sorry for the delay, had to deprioritize this. Hoping to get to this next week. Jonathan From: Jim Brennan Sent: Thursday, April 18, 2019 7:28 AM To: Jonathan Hung Cc: yarn-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org Subject: Re: [DISCUSS] Merging YARN-8200 to branch-3.0 and branch-2 Hi Jonathan, Hi Jim, we have not tested rolling upgrade. I don’t foresee this being an issue, but we’ll try it out and report back. Any update on this? Jim On Wed, Apr 3, 2019 at 2:16 AM Jonathan Hung mailto:jyhung2...@gmail.com>> wrote: Hi Jim, we have not tested rolling upgrade. I don’t foresee this being an issue, but we’ll try it out and report back. Jonathan From: Jim Brennan mailto:james.bren...@verizonmedia.com>> Sent: Tuesday, April 2, 2019 9:17 AM To: Jonathan Hung Cc: yarn-dev@hadoop.apache.org<mailto:yarn-dev@hadoop.apache.org>; mapreduce-...@hadoop.apache.org<mailto:mapreduce-...@hadoop.apache.org> Subject: Re: [DISCUSS] Merging YARN-8200 to branch-3.0 and branch-2 Thanks for working on this! One concern for us is support for a rolling upgrade. If we are running a cluster based on branch-2.8, will we be able to do a rolling upgrade (no cluster down-time) to a branch containing these changes? Have you tested rolling upgrades? Thanks. Jim On Fri, Mar 29, 2019 at 2:14 PM Jonathan Hung mailto:jyhung2...@gmail.com>> wrote: Hello devs, Starting a discuss thread to merge resource types/native GPU scheduling support to branch-3.0 and branch-2. The resource types work was done in trunk~branch-3.0 and GPU support done in trunk~branch-3.1, so the proposal is to merge GPU support into branch-3.0 and both resource types/GPU support to branch-2. Internally we've been running resource types/GPU support off a fork of branch-2.9.0 in a > 300 node GPU cluster for a few months which has worked well. Also for completeness we verified that everything going into branch-2 also exists in branch-3.0. The specific list of patches to merge is in feature branch YARN-8200.branch3 (for branch-3.0) and feature branch YARN-8200 (for branch-2). Full patches containing the YARN-8200.branch3 -> branch-3.0 diff and YARN-8200 -> branch-2 diff have been posted to YARN-8200 jira. If there's no issues from the community I'll start a merge vote next week. Thanks. Jonathan Hung
[jira] [Created] (YARN-9438) launchTime not written to state store for running applications
Jonathan Hung created YARN-9438: --- Summary: launchTime not written to state store for running applications Key: YARN-9438 URL: https://issues.apache.org/jira/browse/YARN-9438 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Hung Assignee: Jonathan Hung launchTime is only saved to state store after application finishes, so if restart happens, any running applications will have launchTime set as -1 (since this is the default timestamp of the recovery event). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [DISCUSS] Merging YARN-8200 to branch-3.0 and branch-2
Hi Jim, we have not tested rolling upgrade. I don’t foresee this being an issue, but we’ll try it out and report back. Jonathan From: Jim Brennan Sent: Tuesday, April 2, 2019 9:17 AM To: Jonathan Hung Cc: yarn-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org Subject: Re: [DISCUSS] Merging YARN-8200 to branch-3.0 and branch-2 Thanks for working on this! One concern for us is support for a rolling upgrade. If we are running a cluster based on branch-2.8, will we be able to do a rolling upgrade (no cluster down-time) to a branch containing these changes? Have you tested rolling upgrades? Thanks. Jim On Fri, Mar 29, 2019 at 2:14 PM Jonathan Hung mailto:jyhung2...@gmail.com>> wrote: Hello devs, Starting a discuss thread to merge resource types/native GPU scheduling support to branch-3.0 and branch-2. The resource types work was done in trunk~branch-3.0 and GPU support done in trunk~branch-3.1, so the proposal is to merge GPU support into branch-3.0 and both resource types/GPU support to branch-2. Internally we've been running resource types/GPU support off a fork of branch-2.9.0 in a > 300 node GPU cluster for a few months which has worked well. Also for completeness we verified that everything going into branch-2 also exists in branch-3.0. The specific list of patches to merge is in feature branch YARN-8200.branch3 (for branch-3.0) and feature branch YARN-8200 (for branch-2). Full patches containing the YARN-8200.branch3 -> branch-3.0 diff and YARN-8200 -> branch-2 diff have been posted to YARN-8200 jira. If there's no issues from the community I'll start a merge vote next week. Thanks. Jonathan Hung
[DISCUSS] Merging YARN-8200 to branch-3.0 and branch-2
Hello devs, Starting a discuss thread to merge resource types/native GPU scheduling support to branch-3.0 and branch-2. The resource types work was done in trunk~branch-3.0 and GPU support done in trunk~branch-3.1, so the proposal is to merge GPU support into branch-3.0 and both resource types/GPU support to branch-2. Internally we've been running resource types/GPU support off a fork of branch-2.9.0 in a > 300 node GPU cluster for a few months which has worked well. Also for completeness we verified that everything going into branch-2 also exists in branch-3.0. The specific list of patches to merge is in feature branch YARN-8200.branch3 (for branch-3.0) and feature branch YARN-8200 (for branch-2). Full patches containing the YARN-8200.branch3 -> branch-3.0 diff and YARN-8200 -> branch-2 diff have been posted to YARN-8200 jira. If there's no issues from the community I'll start a merge vote next week. Thanks. Jonathan Hung
[jira] [Resolved] (YARN-9412) Backport YARN-6909 to branch-2
[ https://issues.apache.org/jira/browse/YARN-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung resolved YARN-9412. - Resolution: Fixed This ended up being a clean port. Closing. > Backport YARN-6909 to branch-2 > -- > > Key: YARN-9412 > URL: https://issues.apache.org/jira/browse/YARN-9412 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Jonathan Hung > Assignee: Jonathan Hung >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9412) Backport YARN-6909 to branch-2
Jonathan Hung created YARN-9412: --- Summary: Backport YARN-6909 to branch-2 Key: YARN-9412 URL: https://issues.apache.org/jira/browse/YARN-9412 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Hung Assignee: Jonathan Hung -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9409) Port resource type changes from YARN-7237 to branch-3.0/branch-2
Jonathan Hung created YARN-9409: --- Summary: Port resource type changes from YARN-7237 to branch-3.0/branch-2 Key: YARN-9409 URL: https://issues.apache.org/jira/browse/YARN-9409 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Hung Assignee: Jonathan Hung -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9397) Fix empty NMResourceInfo object test failures in branch-2
Jonathan Hung created YARN-9397: --- Summary: Fix empty NMResourceInfo object test failures in branch-2 Key: YARN-9397 URL: https://issues.apache.org/jira/browse/YARN-9397 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Hung Assignee: Jonathan Hung Attachments: YARN-9397-YARN-8200.001.patch Appears the empty object handling behavior changed in jersey versions (branch-2 is on jersey 1.9, branch-3 on 1.19). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9291) Backport YARN-7637 to branch-2
Jonathan Hung created YARN-9291: --- Summary: Backport YARN-7637 to branch-2 Key: YARN-9291 URL: https://issues.apache.org/jira/browse/YARN-9291 Project: Hadoop YARN Issue Type: Task Reporter: Jonathan Hung Assignee: Jonathan Hung -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [VOTE] Moving branch-2 precommit/nightly test builds to java 8
My non-binding +1 to finish. This vote passes with 6 binding +1, 3 non-binding +1, and no vetoes. We will make the changes as part of HADOOP-15711, please follow there. Thanks all! Jonathan Hung On Tue, Feb 5, 2019 at 11:38 PM Akira Ajisaka wrote: > +1 > > -Akira > > On Wed, Feb 6, 2019 at 9:13 AM Wangda Tan wrote: > > > > +1, make sense to me. > > > > On Tue, Feb 5, 2019 at 3:29 PM Konstantin Shvachko > > > wrote: > > > > > +1 Makes sense to me. > > > > > > Thanks, > > > --Konst > > > > > > On Mon, Feb 4, 2019 at 6:14 PM Jonathan Hung > wrote: > > > > > > > Hello, > > > > > > > > Starting a vote based on the discuss thread [1] for moving branch-2 > > > > precommit/nightly test builds to openjdk8. After this change, the > test > > > > phase for precommit builds [2] and branch-2 nightly build [3] will > run on > > > > openjdk8. To maintain source compatibility, these builds will still > run > > > > their compile phase for branch-2 on openjdk7 as they do now (in > addition > > > to > > > > compiling on openjdk8). > > > > > > > > Vote will run for three business days until Thursday Feb 7 6:00PM > PDT. > > > > > > > > [1] > > > > > > > > > > > > https://lists.apache.org/thread.html/7e6fb28fc67560f83a2eb62752df35a8d58d86b2a3df4cacb5d738ca@%3Ccommon-dev.hadoop.apache.org%3E > > > > > > > > [2] > > > > > > > > https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-HADOOP-Build/ > > > > > https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-HDFS-Build/ > > > > > https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/ > > > > > > > > > > > > https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-MAPREDUCE-Build/ > > > > > > > > [3] > > > > > > > > > > > > https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-branch2-java7-linux-x86/ > > > > > > > > Jonathan Hung > > > > > > > >
[jira] [Created] (YARN-9289) Backport YARN-7330 for GPU in UI to branch-2
Jonathan Hung created YARN-9289: --- Summary: Backport YARN-7330 for GPU in UI to branch-2 Key: YARN-9289 URL: https://issues.apache.org/jira/browse/YARN-9289 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Hung Assignee: Jonathan Hung -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[VOTE] Moving branch-2 precommit/nightly test builds to java 8
Hello, Starting a vote based on the discuss thread [1] for moving branch-2 precommit/nightly test builds to openjdk8. After this change, the test phase for precommit builds [2] and branch-2 nightly build [3] will run on openjdk8. To maintain source compatibility, these builds will still run their compile phase for branch-2 on openjdk7 as they do now (in addition to compiling on openjdk8). Vote will run for three business days until Thursday Feb 7 6:00PM PDT. [1] https://lists.apache.org/thread.html/7e6fb28fc67560f83a2eb62752df35a8d58d86b2a3df4cacb5d738ca@%3Ccommon-dev.hadoop.apache.org%3E [2] https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-HADOOP-Build/ https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-HDFS-Build/ https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/ https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-MAPREDUCE-Build/ [3] https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-branch2-java7-linux-x86/ Jonathan Hung
[jira] [Created] (YARN-9280) Backport YARN-6620 to YARN-8200/branch-2
Jonathan Hung created YARN-9280: --- Summary: Backport YARN-6620 to YARN-8200/branch-2 Key: YARN-9280 URL: https://issues.apache.org/jira/browse/YARN-9280 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Hung -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [DISCUSS] Moving branch-2 to java 8
Hi Anu, we will configure precommit jobs to continue compiling on openjdk7. If there's incompatible source changes then the precommit job will catch this. The change proposed here is only for the *test* phase of branch-2 precommit executions (and branch-2 nightly job) to run on openjdk8 only. Jonathan Hung On Mon, Feb 4, 2019 at 10:45 AM Anu Engineer wrote: > Konstantin, > > Just a nitpicky thought, if we move this branch to Java-8 on Jenkins, but > still hope to release code that can run on Java 7, how will we detect > Java 8 only changes? I am asking because till now whenever I checked in > Java 8 features in branch-2 Jenkins would catch that issue. > > With this approach, we might not find it out the issues till the release > time when the release manager decides to compile with Java 7. > It might be more pragmatic to say that your Java 7 mileage may vary once > this goes in, since we will have no visibility to Java 7 compatibility > until it is too late. > > Another approach could be that we create a read-only 2.x branch, then we > know that code will work with Java 7 since the last snapshot was known to > work with Java 7. > > > Thanks > Anu > > > > On 2/1/19, 5:04 PM, "Konstantin Shvachko" wrote: > > Just to make sure we are on the same page, as the subject of this > thread is > too generic and confusing. > *The proposal is to move branch-2 Jenkins builds such as precommit to > run > tests on openJDK-8.* > We do not want to break Java 7 source compatibility. The sources and > releases will still depend on Java 7. > We don't see test failures discussed in HADOOP-15711 when we run them > locally with Oracle Java 7. > > Thanks, > --Konst > > On Fri, Feb 1, 2019 at 12:44 PM Jonathan Hung > wrote: > > > Thanks Vinod and Steve, agreed about java7 compile compatibility. At > least > > for now, we should be able to maintain java7 source compatibility > and run > > tests on java8. There's a test run here: > > > https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86-jhung/46/ > > which calls a java8 specific API, installs both openjdk7/openjdk8 in > the > > dockerfile, compiles on both versions, and tests on just java8 (via > > > > > --multijdkdirs=/usr/lib/jvm/java-7-openjdk-amd64,/usr/lib/jvm/java-8-openjdk-amd64 > > and --multijdktests=compile). If we eventually decide it's too much > of a > > pain to maintain java7 source compatibility we can do that at a later > > point. > > > > Also based on discussion with others in the community at the > contributors > > meetup this past Wednesday, seems we are generally in favor of > testing > > against java8. I'll start a vote soon. > > > > Jonathan Hung > > > > > > On Tue, Jan 29, 2019 at 4:11 AM Steve Loughran < > ste...@hortonworks.com> > > wrote: > > > > > branch-2 is the JDK 7 branch, but for a long time I (and presumably > > > others) have relied on jenkins to keep us honest by doing that > build and > > > test > > > > > > right now, we can't do that any more, due to jdk7 bugs which will > never > > be > > > fixed by oracle, or at least, not in a public release. > > > > > > If we can still do the compile in java 7 language and link to java > 7 JDK, > > > then that bit of the release is good -then java 8 can be used for > that > > test > > > > > > Ultimately, we're going to be forced onto java 8 just because all > our > > > dependencies have moved onto it, and some CVE will force us to > move. > > > > > > At which point, I think its time to declare branch-2 dead. It's > had a > > > great life, but trying to keep java 7 support alive isn't > sustainable. > > Not > > > just in this testing, but > > > cherrypicking patches back gets more and more difficult -branch-3 > has > > > moved on in both use of java 8 language, and in the codebase in > general. > > > > > > > On 28 Jan 2019, at 20:18, Vinod Kumar Vavilapalli < > vino...@apache.org> > > > wrote: > > > > > > > > The community made a decision long time ago that we'd like to > keep the > > > compatibility & so tie branch-2 to Java 7, but do Java 8+ only > work on > > 3.x. > > > > >
Re: [DISCUSS] Moving branch-2 to java 8
Yeah, it's possible with yetus, there's one example here <https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-branch2-java7-linux-x86-jhung/60/console> which runs compilation on openjdk7 (and openjdk8), and runs tests on openjdk8 only. Jonathan Hung On Mon, Feb 4, 2019 at 10:11 AM Steve Loughran wrote: > > > On 2 Feb 2019, at 00:57, Konstantin Shvachko wrote: > > Just to make sure we are on the same page, as the subject of this thread > is too generic and confusing. > *The proposal is to move branch-2 Jenkins builds such as precommit to run > tests on openJDK-8.* > We do not want to break Java 7 source compatibility. The sources and > releases will still depend on Java 7. > We don't see test failures discussed in HADOOP-15711 when we run them > locally with Oracle Java 7. > > Thanks, > --Konst > > > Given the tests aren't working today, the risk that an openjdk 8 test run > hides a problem which would show up on openjdk 7 has to consider that at > least openjdk8 will run the tests. > > One thing I would like to be confident is that at least the compile phase > of all the source (including generated source) is on jdk7, and its only the > test run which switches JVM. Can we do that? >
Re: [VOTE] Propose to start new Hadoop sub project "submarine"
+1. Thanks Wangda. Jonathan Hung On Fri, Feb 1, 2019 at 2:25 PM Dinesh Chitlangia < dchitlan...@hortonworks.com> wrote: > +1 (non binding), thanks Wangda for organizing this. > > Regards, > Dinesh > > > > On 2/1/19, 5:24 PM, "Wangda Tan" wrote: > > Hi all, > > According to positive feedbacks from the thread [1] > > This is vote thread to start a new subproject named "hadoop-submarine" > which follows the release process already established for ozone. > > The vote runs for usual 7 days, which ends at Feb 8th 5 PM PDT. > > Thanks, > Wangda Tan > > [1] > > https://lists.apache.org/thread.html/f864461eb188bd12859d51b0098ec38942c4429aae7e4d001a633d96@%3Cyarn-dev.hadoop.apache.org%3E > > >
[jira] [Created] (YARN-9272) Backport YARN-7738 for refreshing max allocation for multiple resource types
Jonathan Hung created YARN-9272: --- Summary: Backport YARN-7738 for refreshing max allocation for multiple resource types Key: YARN-9272 URL: https://issues.apache.org/jira/browse/YARN-9272 Project: Hadoop YARN Issue Type: Sub-task Environment: Backport to YARN-8200 feature branch (for branch-2). Reporter: Jonathan Hung Assignee: Jonathan Hung -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9271) Backport YARN-6927 for resource type support in MapReduce
Jonathan Hung created YARN-9271: --- Summary: Backport YARN-6927 for resource type support in MapReduce Key: YARN-9271 URL: https://issues.apache.org/jira/browse/YARN-9271 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Hung -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [DISCUSS] Moving branch-2 to java 8
Thanks Vinod and Steve, agreed about java7 compile compatibility. At least for now, we should be able to maintain java7 source compatibility and run tests on java8. There's a test run here: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86-jhung/46/ which calls a java8 specific API, installs both openjdk7/openjdk8 in the dockerfile, compiles on both versions, and tests on just java8 (via --multijdkdirs=/usr/lib/jvm/java-7-openjdk-amd64,/usr/lib/jvm/java-8-openjdk-amd64 and --multijdktests=compile). If we eventually decide it's too much of a pain to maintain java7 source compatibility we can do that at a later point. Also based on discussion with others in the community at the contributors meetup this past Wednesday, seems we are generally in favor of testing against java8. I'll start a vote soon. Jonathan Hung On Tue, Jan 29, 2019 at 4:11 AM Steve Loughran wrote: > branch-2 is the JDK 7 branch, but for a long time I (and presumably > others) have relied on jenkins to keep us honest by doing that build and > test > > right now, we can't do that any more, due to jdk7 bugs which will never be > fixed by oracle, or at least, not in a public release. > > If we can still do the compile in java 7 language and link to java 7 JDK, > then that bit of the release is good -then java 8 can be used for that test > > Ultimately, we're going to be forced onto java 8 just because all our > dependencies have moved onto it, and some CVE will force us to move. > > At which point, I think its time to declare branch-2 dead. It's had a > great life, but trying to keep java 7 support alive isn't sustainable. Not > just in this testing, but > cherrypicking patches back gets more and more difficult -branch-3 has > moved on in both use of java 8 language, and in the codebase in general. > > > On 28 Jan 2019, at 20:18, Vinod Kumar Vavilapalli > wrote: > > > > The community made a decision long time ago that we'd like to keep the > compatibility & so tie branch-2 to Java 7, but do Java 8+ only work on 3.x. > > > > I always assumed that most (all?) downstream users build branch-2 on JDK > 7 only, can anyone confirm? If so, there may be an easier way to address > these test issues. > > > > +Vinod > > > >> On Jan 28, 2019, at 11:24 AM, Jonathan Hung > wrote: > >> > >> Hi folks, > >> > >> Forking a discussion based on HADOOP-15711. To summarize, there are > issues > >> with branch-2 tests running on java 7 (openjdk) which don't exist on > java > >> 8. From our testing, the build can pass with openjdk 8. > >> > >> For branch-3, the work to move the build to use java 8 was done in > >> HADOOP-14816 as part of the Dockerfile OS version change. HADOOP-16053 > was > >> filed to backport this OS version change to branch-2 (but without the > java > >> 7 -> java 8 change). So my proposal is to also make the java 7 -> java 8 > >> version change in branch-2. > >> > >> As mentioned in HADOOP-15711, the main issue is around source and binary > >> compatibility. I don't currently have a great answer, but one initial > >> thought is to build source/binary against java 7 to ensure compatibility > >> and run the rest of the build as java 8. > >> > >> Thoughts? > >> > >> Jonathan Hung > > > > > > - > > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org > > > >
[jira] [Resolved] (YARN-9261) Backport YARN-7270 addendum to YARN-8200
[ https://issues.apache.org/jira/browse/YARN-9261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung resolved YARN-9261. - Resolution: Fixed Clean backport. Pushed to YARN-8200 > Backport YARN-7270 addendum to YARN-8200 > > > Key: YARN-9261 > URL: https://issues.apache.org/jira/browse/YARN-9261 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Jonathan Hung > Assignee: Jonathan Hung >Priority: Major > > There was an addendum to YARN-7270 added to branch-3.0 for changes after > resource-type feature was added. We need it in YARN-8200 feature branch too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9261) Backport YARN-7270 addendum to YARN-8200
Jonathan Hung created YARN-9261: --- Summary: Backport YARN-7270 addendum to YARN-8200 Key: YARN-9261 URL: https://issues.apache.org/jira/browse/YARN-9261 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Hung Assignee: Jonathan Hung There was an addendum to YARN-7270 added to branch-3.0 for changes after resource-type feature was added. We need it in YARN-8200 feature branch too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [DISCUSS] Making submarine to different release model like Ozone
+1. This is important for improving the deep learning on hadoop story. There's recently a lot of momentum for this, and decoupling submarine/hadoop will help it continue. Jonathan Hung On Thu, Jan 31, 2019 at 11:04 AM Wangda Tan wrote: > Hi devs, > > Since we started submarine-related effort last year, we received a lot of > feedbacks, several companies (such as Netease, China Mobile, etc.) are > trying to deploy Submarine to their Hadoop cluster along with big data > workloads. Linkedin also has big interests to contribute a Submarine TonY ( > https://github.com/linkedin/TonY) runtime to allow users to use the same > interface. > > From what I can see, there're several issues of putting Submarine under > yarn-applications directory and have same release cycle with Hadoop: > > 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan > 2019. Because of non-predictable blockers and security issues, it got > delayed a lot. We need to iterate submarine fast at this point. > > 2) We also see a lot of requirements to use Submarine on older Hadoop > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a > short time, but the requirement to run deep learning is urgent to them. We > should decouple Submarine from Hadoop version. > > And why we wanna to keep it within Hadoop? First, Submarine included some > innovation parts such as enhancements of user experiences for YARN > services/containerization support which we can add it back to Hadoop later > to address common requirements. In addition to that, we have a big overlap > in the community developing and using it. > > There're several proposals we have went through during Ozone merge to trunk > discussion: > > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3ccahfhakh6_m3yldf5a2kq8+w-5fbvx5ahfgs-x1vajw8gmnz...@mail.gmail.com%3E > > I propose to adopt Ozone model: which is the same master branch, different > release cycle, and different release branch. It is a great example to show > agile release we can do (2 Ozone releases after Oct 2018) with less > overhead to setup CI, projects, etc. > > *Links:* > - JIRA: https://issues.apache.org/jira/browse/YARN-8135 > - Design doc > < > https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit > > > - User doc > < > https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html > > > (3.2.0 > release) > - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop > < > https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/ > >, > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>) > - Talks: Strata Data Conf NY > < > https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289 > > > > Thoughts? > > Thanks, > Wangda Tan >
[DISCUSS] Moving branch-2 to java 8
Hi folks, Forking a discussion based on HADOOP-15711. To summarize, there are issues with branch-2 tests running on java 7 (openjdk) which don't exist on java 8. From our testing, the build can pass with openjdk 8. For branch-3, the work to move the build to use java 8 was done in HADOOP-14816 as part of the Dockerfile OS version change. HADOOP-16053 was filed to backport this OS version change to branch-2 (but without the java 7 -> java 8 change). So my proposal is to also make the java 7 -> java 8 version change in branch-2. As mentioned in HADOOP-15711, the main issue is around source and binary compatibility. I don't currently have a great answer, but one initial thought is to build source/binary against java 7 to ensure compatibility and run the rest of the build as java 8. Thoughts? Jonathan Hung
[jira] [Resolved] (YARN-9222) Change startTime semantics for RMApp
[ https://issues.apache.org/jira/browse/YARN-9222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung resolved YARN-9222. - Resolution: Fixed darn it, seems this is a dupe of YARN-7088 > Change startTime semantics for RMApp > > > Key: YARN-9222 > URL: https://issues.apache.org/jira/browse/YARN-9222 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Jonathan Hung >Priority: Major > > Currently submitTime for rmApp is based on when app is submitted to > RMAppManager: > {noformat} > rmAppManager.submitApplication(submissionContext, > System.currentTimeMillis(), user);{noformat} > Then RMAppManager#createAndPopulateNewRMApp does some validation (queue > routing, app priority, etc), then the RMAppImpl object is created, at which > point the startTime is populated: > {noformat} > if (startTime <= 0) { > this.startTime = this.systemClock.getTime(); > } else { > this.startTime = startTime; > }{noformat} > In general it seems there shouldn't be much difference between submitTime and > startTime. It makes more sense to change startTime to when the app actually > started. One possible solution is to (re)populate startTime when application > master registers with RM. > One issue may be compatibility, especially if there are large scheduling > delays. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9222) Change startTime semantics for RMApp
Jonathan Hung created YARN-9222: --- Summary: Change startTime semantics for RMApp Key: YARN-9222 URL: https://issues.apache.org/jira/browse/YARN-9222 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Hung Currently submitTime for rmApp is based on when app is submitted to RMAppManager: {noformat} rmAppManager.submitApplication(submissionContext, System.currentTimeMillis(), user);{noformat} Then RMAppManager#createAndPopulateNewRMApp does some validation (queue routing, app priority, etc), then the RMAppImpl object is created, at which point the startTime is populated: {noformat} if (startTime <= 0) { this.startTime = this.systemClock.getTime(); } else { this.startTime = startTime; }{noformat} In general it seems there shouldn't be much difference between submitTime and startTime. It makes more sense to change startTime to when the app actually started. One possible solution is to (re)populate startTime when application master registers with RM. One issue may be compatibility, especially if there are large scheduling delays. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9188) Port YARN-7136 to branch-2
Jonathan Hung created YARN-9188: --- Summary: Port YARN-7136 to branch-2 Key: YARN-9188 URL: https://issues.apache.org/jira/browse/YARN-9188 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Hung Assignee: Jonathan Hung -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9187) Backport YARN-6852 for GPU-specific native changes to branch-2
Jonathan Hung created YARN-9187: --- Summary: Backport YARN-6852 for GPU-specific native changes to branch-2 Key: YARN-9187 URL: https://issues.apache.org/jira/browse/YARN-9187 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Hung Assignee: Jonathan Hung Attachments: YARN-9187-YARN-8200.001.patch YARN-6852 adds native GPU support, including # general native code refactoring # GPU specific native code 1 is handled by YARN-7321 in branch-2. This ticket is for handling 2 in branch-2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9182) Backport YARN-6445 resource profile performance improvements to branch-2
Jonathan Hung created YARN-9182: --- Summary: Backport YARN-6445 resource profile performance improvements to branch-2 Key: YARN-9182 URL: https://issues.apache.org/jira/browse/YARN-9182 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Hung Attachments: YARN-9182-YARN-8200.001.patch -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9181) Backport YARN-6232 for generic resource type usage to branch-2
Jonathan Hung created YARN-9181: --- Summary: Backport YARN-6232 for generic resource type usage to branch-2 Key: YARN-9181 URL: https://issues.apache.org/jira/browse/YARN-9181 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Hung -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9180) Port YARN-7033 NM recovery of assigned resources to branch-3.0/branch-2
Jonathan Hung created YARN-9180: --- Summary: Port YARN-7033 NM recovery of assigned resources to branch-3.0/branch-2 Key: YARN-9180 URL: https://issues.apache.org/jira/browse/YARN-9180 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Hung Assignee: Jonathan Hung -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org