[jira] [Resolved] (YARN-8849) DynoYARN: A simulation and testing infrastructure for YARN clusters

2021-09-20 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung resolved YARN-8849.
-
Resolution: Fixed

FYI we have open source DynoYARN on Github: https://github.com/linkedin/dynoyarn

> DynoYARN: A simulation and testing infrastructure for YARN clusters
> ---
>
> Key: YARN-8849
> URL: https://issues.apache.org/jira/browse/YARN-8849
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun Suresh
>    Assignee: Jonathan Hung
>Priority: Major
>
> Traditionally, YARN workload simulation is performed using SLS (Scheduler 
> Load Simulator) which is packaged with YARN. It Essentially, starts a full 
> fledged *ResourceManager*, but runs simulators for the *NodeManager* and the 
> *ApplicationMaster* Containers. These simulators are lightweight and run in a 
> threadpool. The NM simulators do not open any external ports and send 
> (in-process) heartbeats to the ResourceManager.
> There are a couple of drawbacks with using the SLS:
>  * It might be difficult to simulate really large clusters without having 
> access to a very beefy box - since the NMs are launched as tasks in a 
> threadpool, and each NM has to send periodic heartbeats to the RM.
>  * Certain features (like YARN-1011) requires changes to the NodeManager - 
> aspects such as queuing and selectively killing containers have to be 
> incorporated into the existing NM Simulator which might make the simulator a 
> bit heavy weight - there is a need for locking and synchronization.
>  * Since the NM and AM are simulations, only the Scheduler is faithfully 
> tested - it does not really perform an end-2-end test of a cluster.
> Therefore, drawing inspiration from 
> [Dynamometer|https://github.com/linkedin/dynamometer], we propose a framework 
> for YARN deployable YARN cluster - *DynoYARN* - for testing, with the 
> following features:
>  * The NM already has hooks to plug-in custom *ContainerExecutor* and 
> *NodeResourceMonitor*. If we can plug-in a custom *ContainersMonitorImpl*'s 
> Monitoring thread (and other modules like the LocalizationService), We can 
> probably inject an Executor that does not actually launch containers and a 
> Node and Container resource monitor that reports synthetic pre-specified 
> Utilization metrics back to the RM.
>  * Since we are launching fake containers, we cannot run normal AM 
> containers. We can therefore, use *Unmanaged AM*'s to launch synthetic jobs.
> Essentially, a test workflow would look like this:
>  * Launch a DynoYARN cluster.
>  * Use the Unmanaged AM feature to directly negotiate with the DynaYARN 
> Resource Manager for container tokens.
>  * Use the container tokens from the RM to directly ask the DynoYARN Node 
> Managers to start fake containers.
>  * The DynoYARN NodeManagers will start the fake containers and report to the 
> DynoYARN Resource Manager synthetically generated resource utilization for 
> the containers (which will be injected via the *ContainerLaunchContext* and 
> parsed by the plugged-in Container Executor).
>  * The Scheduler will use the utilization report to schedule containers - we 
> will be able to test allocation of *Opportunistic* containers based on 
> resource utilization.
>  * Since the DynoYARN Node Managers run the actual code paths, all preemption 
> and queuing logic will be faithfully executed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-10297) TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails intermittently

2020-05-29 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-10297:


 Summary: 
TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails 
intermittently
 Key: YARN-10297
 URL: https://issues.apache.org/jira/browse/YARN-10297
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung


After YARN-6492, testFairSchedulerContinuousSchedulingInitTime fails 
intermittently.
{noformat}[INFO] Running 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
[ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.682 s 
<<< FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
[ERROR] 
testFairSchedulerContinuousSchedulingInitTime(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling)
  Time elapsed: 0.194 s  <<< ERROR!
org.apache.hadoop.metrics2.MetricsException: Metrics source 
PartitionQueueMetrics,partition= already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.getPartitionMetrics(QueueMetrics.java:362)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.incrPendingResources(QueueMetrics.java:601)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updatePendingResources(AppSchedulingInfo.java:388)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:320)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:347)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:183)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:456)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:898)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testFairSchedulerContinuousSchedulingInitTime(TestContinuousScheduling.java:375)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-10263) Application summary is logged multiple times due to RM recovery

2020-05-11 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-10263:


 Summary: Application summary is logged multiple times due to RM 
recovery
 Key: YARN-10263
 URL: https://issues.apache.org/jira/browse/YARN-10263
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung


App finishes, and is logged to RM app summary. Restart RM. Then this app is 
logged to RM app summary again.

We would somehow need to know cross-restart whether an app has been logged or 
not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-10260) Allow transitioning queue from DRAINING to RUNNING state

2020-05-06 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-10260:


 Summary: Allow transitioning queue from DRAINING to RUNNING state
 Key: YARN-10260
 URL: https://issues.apache.org/jira/browse/YARN-10260
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung


We found in our cluster, a queue was erroneously stopped. Then queue is 
internally in DRAINING state. It cannot be moved back to RUNNING state until 
the queue is finished draining. For queues with large workloads, this can block 
other apps from submitting to this queue for a long time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [DISCUSS] Making 2.10 the last minor 2.x release

2020-04-16 Thread Jonathan Hung
Source code has been deleted from branch-2. Thanks Akira for taking this up!

Jonathan Hung


On Thu, Apr 16, 2020 at 11:40 AM Jonathan Hung  wrote:

> Makes sense. I've cherry-picked the commits in branch-2 that were missed
> in branch-2.10.
>
> Jonathan Hung
>
>
> On Wed, Apr 15, 2020 at 2:25 AM Akira Ajisaka  wrote:
>
>> Hi folks,
>>
>> I am still seeing some changes are being committed to branch-2.
>> I'd like to delete the source code from branch-2 to avoid mistakes.
>> https://issues.apache.org/jira/browse/HADOOP-16988
>>
>> -Akira
>>
>> On Wed, Jan 1, 2020 at 2:38 AM Ayush Saxena  wrote:
>>
>>> Hi Jim,
>>> Thanx for catching, I have configured the build to run on branch-2.10.
>>>
>>> -Ayush
>>>
>>> On Tue, 31 Dec 2019 at 22:50, Jim Brennan <
>>> james.bren...@verizonmedia.com> wrote:
>>>
>>>> It looks like QBT tests are still being run on branch-2 (
>>>> https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-branch2-java7-linux-x86/),
>>>> and they are not very helpful at this point.
>>>> Can we change the QBT tests to run against branch-2.10 instead?
>>>>
>>>> Jim
>>>>
>>>> On Mon, Dec 23, 2019 at 7:44 PM Akira Ajisaka 
>>>> wrote:
>>>>
>>>>> Thank you, Ayush.
>>>>>
>>>>> I understand we should keep branch-2 as is, as well as master.
>>>>>
>>>>> -Akira
>>>>>
>>>>>
>>>>> On Mon, Dec 23, 2019 at 9:14 PM Ayush Saxena 
>>>>> wrote:
>>>>>
>>>>> > Hi Akira
>>>>> > Seems there was an INFRA ticket for that. INFRA-19581,
>>>>> > But the INFRA people closed as wont do and yes, the branch is
>>>>> protected,
>>>>> > we can’t delete it directly.
>>>>> >
>>>>> > Ref: https://issues.apache.org/jira/browse/INFRA-19581
>>>>> >
>>>>> > -Ayush
>>>>> >
>>>>> > On 23-Dec-2019, at 5:03 PM, Akira Ajisaka 
>>>>> wrote:
>>>>> >
>>>>> > Thank you for your work, Jonathan.
>>>>> >
>>>>> > I found branch-2 has been unintentionally pushed again. Would you
>>>>> remove
>>>>> > it?
>>>>> > I think the branch should be protected if possible.
>>>>> >
>>>>> > -Akira
>>>>> >
>>>>> > On Tue, Dec 10, 2019 at 5:17 AM Jonathan Hung 
>>>>> > wrote:
>>>>> >
>>>>> > It's done. The new commit chain is: trunk -> branch-3.2 ->
>>>>> branch-3.1 ->
>>>>> >
>>>>> > branch-2.10 -> branch-2.9 -> branch-2.8 (branch-2 no longer exists,
>>>>> please
>>>>> >
>>>>> > don't try to commit to it)
>>>>> >
>>>>> >
>>>>> > Completed procedure:
>>>>> >
>>>>> >
>>>>> >   - Verified everything in old branch-2.10 was in old branch-2
>>>>> >
>>>>> >   - Delete old branch-2.10
>>>>> >
>>>>> >   - Rename branch-2 to (new) branch-2.10
>>>>> >
>>>>> >   - Set version in new branch-2.10 to 2.10.1-SNAPSHOT
>>>>> >
>>>>> >   - Renamed fix versions from 2.11.0 to 2.10.1
>>>>> >
>>>>> >   - Removed 2.11.0 as a version in HADOOP/YARN/HDFS/MAPREDUCE
>>>>> >
>>>>> >
>>>>> >
>>>>> > Jonathan Hung
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Wed, Dec 4, 2019 at 10:55 AM Jonathan Hung 
>>>>> >
>>>>> > wrote:
>>>>> >
>>>>> >
>>>>> > FYI, starting the rename process, beginning with INFRA-19521.
>>>>> >
>>>>> >
>>>>> > Jonathan Hung
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Wed, Nov 27, 2019 at 12:15 PM Konstantin Shvachko <
>>>>> >
>>>>> > shv.had...@gmail.com>
>>>>> >
>>>>> > wrote:
>>>>> >
>>>>> >
>>>>> &g

Re: [DISCUSS] Making 2.10 the last minor 2.x release

2020-04-16 Thread Jonathan Hung
Makes sense. I've cherry-picked the commits in branch-2 that were missed in
branch-2.10.

Jonathan Hung


On Wed, Apr 15, 2020 at 2:25 AM Akira Ajisaka  wrote:

> Hi folks,
>
> I am still seeing some changes are being committed to branch-2.
> I'd like to delete the source code from branch-2 to avoid mistakes.
> https://issues.apache.org/jira/browse/HADOOP-16988
>
> -Akira
>
> On Wed, Jan 1, 2020 at 2:38 AM Ayush Saxena  wrote:
>
>> Hi Jim,
>> Thanx for catching, I have configured the build to run on branch-2.10.
>>
>> -Ayush
>>
>> On Tue, 31 Dec 2019 at 22:50, Jim Brennan 
>> wrote:
>>
>>> It looks like QBT tests are still being run on branch-2 (
>>> https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-branch2-java7-linux-x86/),
>>> and they are not very helpful at this point.
>>> Can we change the QBT tests to run against branch-2.10 instead?
>>>
>>> Jim
>>>
>>> On Mon, Dec 23, 2019 at 7:44 PM Akira Ajisaka 
>>> wrote:
>>>
>>>> Thank you, Ayush.
>>>>
>>>> I understand we should keep branch-2 as is, as well as master.
>>>>
>>>> -Akira
>>>>
>>>>
>>>> On Mon, Dec 23, 2019 at 9:14 PM Ayush Saxena 
>>>> wrote:
>>>>
>>>> > Hi Akira
>>>> > Seems there was an INFRA ticket for that. INFRA-19581,
>>>> > But the INFRA people closed as wont do and yes, the branch is
>>>> protected,
>>>> > we can’t delete it directly.
>>>> >
>>>> > Ref: https://issues.apache.org/jira/browse/INFRA-19581
>>>> >
>>>> > -Ayush
>>>> >
>>>> > On 23-Dec-2019, at 5:03 PM, Akira Ajisaka 
>>>> wrote:
>>>> >
>>>> > Thank you for your work, Jonathan.
>>>> >
>>>> > I found branch-2 has been unintentionally pushed again. Would you
>>>> remove
>>>> > it?
>>>> > I think the branch should be protected if possible.
>>>> >
>>>> > -Akira
>>>> >
>>>> > On Tue, Dec 10, 2019 at 5:17 AM Jonathan Hung 
>>>> > wrote:
>>>> >
>>>> > It's done. The new commit chain is: trunk -> branch-3.2 -> branch-3.1
>>>> ->
>>>> >
>>>> > branch-2.10 -> branch-2.9 -> branch-2.8 (branch-2 no longer exists,
>>>> please
>>>> >
>>>> > don't try to commit to it)
>>>> >
>>>> >
>>>> > Completed procedure:
>>>> >
>>>> >
>>>> >   - Verified everything in old branch-2.10 was in old branch-2
>>>> >
>>>> >   - Delete old branch-2.10
>>>> >
>>>> >   - Rename branch-2 to (new) branch-2.10
>>>> >
>>>> >   - Set version in new branch-2.10 to 2.10.1-SNAPSHOT
>>>> >
>>>> >   - Renamed fix versions from 2.11.0 to 2.10.1
>>>> >
>>>> >   - Removed 2.11.0 as a version in HADOOP/YARN/HDFS/MAPREDUCE
>>>> >
>>>> >
>>>> >
>>>> > Jonathan Hung
>>>> >
>>>> >
>>>> >
>>>> > On Wed, Dec 4, 2019 at 10:55 AM Jonathan Hung 
>>>> >
>>>> > wrote:
>>>> >
>>>> >
>>>> > FYI, starting the rename process, beginning with INFRA-19521.
>>>> >
>>>> >
>>>> > Jonathan Hung
>>>> >
>>>> >
>>>> >
>>>> > On Wed, Nov 27, 2019 at 12:15 PM Konstantin Shvachko <
>>>> >
>>>> > shv.had...@gmail.com>
>>>> >
>>>> > wrote:
>>>> >
>>>> >
>>>> > Hey guys,
>>>> >
>>>> >
>>>> > I think we diverged a bit from the initial topic of this discussion,
>>>> >
>>>> > which is removing branch-2.10, and changing the version of branch-2
>>>> from
>>>> >
>>>> > 2.11.0-SNAPSHOT to 2.10.1-SNAPSHOT.
>>>> >
>>>> > Sounds like the subject line for this thread "Making 2.10 the last
>>>> minor
>>>> >
>>>> > 2.x release" confused people.
>>>> >
>>>> > It is in fact a wider matter that can be di

[jira] [Created] (YARN-10212) Create separate configuration for max global AM attempts

2020-03-27 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-10212:


 Summary: Create separate configuration for max global AM attempts
 Key: YARN-10212
 URL: https://issues.apache.org/jira/browse/YARN-10212
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung


Right now user's default max AM attempts is set to the same as global max AM 
attempts:
{noformat}
int globalMaxAppAttempts = conf.getInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS,
YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS); {noformat}
If we want to increase global max AM attempts, it will also increase the 
default. So we should create a separate global AM max attempts config to 
separate the two.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-10200) Add number of containers to RMAppManager summary

2020-03-16 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-10200:


 Summary: Add number of containers to RMAppManager summary
 Key: YARN-10200
 URL: https://issues.apache.org/jira/browse/YARN-10200
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung


We track number of containers per app, it would be useful to persist this so we 
can track long-term containers processed by RM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-10192) CapacityScheduler stuck in loop rejecting allocation proposals

2020-03-11 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-10192:


 Summary: CapacityScheduler stuck in loop rejecting allocation 
proposals
 Key: YARN-10192
 URL: https://issues.apache.org/jira/browse/YARN-10192
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Hung


On a 2.10.0 cluster, we observed containers were being scheduled very slowly. 
Based on logs, it seems to reject a bunch of allocation proposals, then accept 
a bunch of reserved containers, but very few containers are actually getting 
allocated:
{noformat}
2020-03-10 06:31:48,965 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
assignedContainer queue=root usedCapacity=0.30113637 
absoluteUsedCapacity=0.30113637 used= cluster=
2020-03-10 06:31:48,965 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Failed to accept allocation proposal
2020-03-10 06:31:48,965 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
 assignedContainer application attempt=appattempt_1582403122262_15460_01 
container=null queue=misc_default clusterResource= type=OFF_SWITCH requestedPartition=cpu
2020-03-10 06:31:48,965 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
assignedContainer queue=misc usedCapacity=0.0031771248 
absoluteUsedCapacity=3.1771246E-4 used= 
cluster=
2020-03-10 06:31:48,965 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
assignedContainer queue=root usedCapacity=0.30113637 
absoluteUsedCapacity=0.30113637 used= cluster=
2020-03-10 06:31:48,965 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Failed to accept allocation proposal
2020-03-10 06:31:48,968 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
 assignedContainer application attempt=appattempt_1582403122262_15460_01 
container=null queue=misc_default clusterResource= type=OFF_SWITCH requestedPartition=cpu
2020-03-10 06:31:48,968 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
assignedContainer queue=misc usedCapacity=0.0031771248 
absoluteUsedCapacity=3.1771246E-4 used= 
cluster=
2020-03-10 06:31:48,968 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
assignedContainer queue=root usedCapacity=0.30113637 
absoluteUsedCapacity=0.30113637 used= cluster=
2020-03-10 06:31:48,968 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Failed to accept allocation proposal
2020-03-10 06:31:48,977 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
 assignedContainer application attempt=appattempt_1582403122262_15460_01 
container=null queue=misc_default clusterResource= type=OFF_SWITCH requestedPartition=cpu
2020-03-10 06:31:48,977 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
assignedContainer queue=misc usedCapacity=0.0031771248 
absoluteUsedCapacity=3.1771246E-4 used= 
cluster=
2020-03-10 06:31:48,977 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
assignedContainer queue=root usedCapacity=0.30113637 
absoluteUsedCapacity=0.30113637 used= cluster=
2020-03-10 06:31:48,977 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Failed to accept allocation proposal
2020-03-10 06:31:48,981 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
 assignedContainer application attempt=appattempt_1582403122262_15460_01 
container=null queue=misc_default clusterResource= type=OFF_SWITCH requestedPartition=cpu
2020-03-10 06:31:48,982 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
assignedContainer queue=misc usedCapacity=0.0031771248 
absoluteUsedCapacity=3.1771246E-4 used= 
cluster=
2020-03-10 06:31:48,982 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
assignedContainer queue=root usedCapacity=0.30113637 
absoluteUsedCapacity=0.30113637 used= cluster=
2020-03-10 06:31:48,982 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Failed to accept allocation proposal
2020-03-10 06:31:48,985 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
 assignedContainer application attempt=appattempt_1582403122262_15460_01 
container=null queue=misc_default clusterResource= type=OFF_SWITCH requestedPartition=cpu
2020-03-10 06:31:48,985 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
assignedContainer queue=misc usedCapacity=0.0031771248 
absoluteUsedCapacity=3.1771246E-4 used

[jira] [Created] (YARN-10134) Periodically sync backend scheduler configuration changes to capacity-scheduler.xml

2020-02-12 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-10134:


 Summary: Periodically sync backend scheduler configuration changes 
to capacity-scheduler.xml
 Key: YARN-10134
 URL: https://issues.apache.org/jira/browse/YARN-10134
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Hung


In case backend scheduler configuration changes are lost, it'd be good to have 
a relatively up-to-date configuration in capacity-scheduler.xml.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-10116) Expose diagnostics in RMAppManager summary

2020-02-04 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-10116:


 Summary: Expose diagnostics in RMAppManager summary
 Key: YARN-10116
 URL: https://issues.apache.org/jira/browse/YARN-10116
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung
Assignee: Jonathan Hung


It's useful for tracking app diagnostics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-10039) Allow disabling app submission from REST endpoints

2019-12-17 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-10039:


 Summary: Allow disabling app submission from REST endpoints
 Key: YARN-10039
 URL: https://issues.apache.org/jira/browse/YARN-10039
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung


Introduce a configuration which allows disabling /apps/new-application and 
/apps POST endpoints. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [DISCUSS] Making 2.10 the last minor 2.x release

2019-12-09 Thread Jonathan Hung
It's done. The new commit chain is: trunk -> branch-3.2 -> branch-3.1 ->
branch-2.10 -> branch-2.9 -> branch-2.8 (branch-2 no longer exists, please
don't try to commit to it)

Completed procedure:

   - Verified everything in old branch-2.10 was in old branch-2
   - Delete old branch-2.10
   - Rename branch-2 to (new) branch-2.10
   - Set version in new branch-2.10 to 2.10.1-SNAPSHOT
   - Renamed fix versions from 2.11.0 to 2.10.1
   - Removed 2.11.0 as a version in HADOOP/YARN/HDFS/MAPREDUCE


Jonathan Hung


On Wed, Dec 4, 2019 at 10:55 AM Jonathan Hung  wrote:

> FYI, starting the rename process, beginning with INFRA-19521.
>
> Jonathan Hung
>
>
> On Wed, Nov 27, 2019 at 12:15 PM Konstantin Shvachko 
> wrote:
>
>> Hey guys,
>>
>> I think we diverged a bit from the initial topic of this discussion,
>> which is removing branch-2.10, and changing the version of branch-2 from
>> 2.11.0-SNAPSHOT to 2.10.1-SNAPSHOT.
>> Sounds like the subject line for this thread "Making 2.10 the last minor
>> 2.x release" confused people.
>> It is in fact a wider matter that can be discussed when somebody actually
>> proposes to release 2.11, which I understand nobody does at the moment.
>>
>> So if anybody objects removing branch-2.10 please make an argument.
>> Otherwise we should go ahead and just do it next week.
>> I see people still struggling to keep branch-2 and branch-2.10 in sync.
>>
>> Thanks,
>> --Konstantin
>>
>> On Thu, Nov 21, 2019 at 3:49 PM Jonathan Hung 
>> wrote:
>>
>>> Thanks for the detailed thoughts, everyone.
>>>
>>> Eric (Badger), my understanding is the same as yours re. minor vs patch
>>> releases. As for putting features into minor/patch releases, if we keep the
>>> convention of putting new features only into minor releases, my assumption
>>> is still that it's unlikely people will want to get them into branch-2
>>> (based on the 2.10.0 release process). For the java 11 issue, we haven't
>>> even really removed support for java 7 in branch-2 (much less java 8), so I
>>> feel moving to java 11 would go along with a move to branch 3. And as you
>>> mentioned, if people really want to use java 11 on branch-2, we can always
>>> revive branch-2. But for now I think the convenience of not needing to port
>>> to both branch-2 and branch-2.10 (and below) outweighs the cost of
>>> potentially needing to revive branch-2.
>>>
>>> Jonathan Hung
>>>
>>>
>>> On Wed, Nov 20, 2019 at 10:50 AM Eric Yang  wrote:
>>>
>>>> +1 for 2.10.x as last release for 2.x version.
>>>>
>>>> Software would become more compatible when more companies stress test
>>>> the same software and making improvements in trunk.  Some may be extra
>>>> caution on moving up the version because obligation internally to keep
>>>> things running.  Company obligation should not be the driving force to
>>>> maintain Hadoop branches.  There is no proper collaboration in the
>>>> community when every name brand company maintains its own Hadoop 2.x
>>>> version.  I think it would be more healthy for the community to reduce the
>>>> branch forking and spend energy on trunk to harden the software.  This will
>>>> give more confidence to move up the version than trying to fix n
>>>> permutations breakage like Flash fixing the timeline.
>>>>
>>>> Apache license stated, there is no warranty of any kind for code
>>>> contributions.  Fewer community release process should improve software
>>>> quality when eyes are on trunk, and help steering toward the same end 
>>>> goals.
>>>>
>>>> regards,
>>>> Eric
>>>>
>>>>
>>>>
>>>> On Tue, Nov 19, 2019 at 3:03 PM Eric Badger
>>>>  wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> Is it written anywhere what the difference is between a minor release
>>>>> and a
>>>>> point/dot/maintenance (I'll use "point" from here on out) release? I
>>>>> have
>>>>> looked around and I can't find anything other than some compatibility
>>>>> documentation in 2.x that has since been removed in 3.x [1] [2]. I
>>>>> think
>>>>> this would help shape my opinion on whether or not to keep branch-2
>>>>> alive.
>>>>> My current understanding is that we can't really break compatibility i

Re: [DISCUSS] Making 2.10 the last minor 2.x release

2019-12-04 Thread Jonathan Hung
FYI, starting the rename process, beginning with INFRA-19521.

Jonathan Hung


On Wed, Nov 27, 2019 at 12:15 PM Konstantin Shvachko 
wrote:

> Hey guys,
>
> I think we diverged a bit from the initial topic of this discussion, which
> is removing branch-2.10, and changing the version of branch-2 from
> 2.11.0-SNAPSHOT to 2.10.1-SNAPSHOT.
> Sounds like the subject line for this thread "Making 2.10 the last minor
> 2.x release" confused people.
> It is in fact a wider matter that can be discussed when somebody actually
> proposes to release 2.11, which I understand nobody does at the moment.
>
> So if anybody objects removing branch-2.10 please make an argument.
> Otherwise we should go ahead and just do it next week.
> I see people still struggling to keep branch-2 and branch-2.10 in sync.
>
> Thanks,
> --Konstantin
>
> On Thu, Nov 21, 2019 at 3:49 PM Jonathan Hung 
> wrote:
>
>> Thanks for the detailed thoughts, everyone.
>>
>> Eric (Badger), my understanding is the same as yours re. minor vs patch
>> releases. As for putting features into minor/patch releases, if we keep the
>> convention of putting new features only into minor releases, my assumption
>> is still that it's unlikely people will want to get them into branch-2
>> (based on the 2.10.0 release process). For the java 11 issue, we haven't
>> even really removed support for java 7 in branch-2 (much less java 8), so I
>> feel moving to java 11 would go along with a move to branch 3. And as you
>> mentioned, if people really want to use java 11 on branch-2, we can always
>> revive branch-2. But for now I think the convenience of not needing to port
>> to both branch-2 and branch-2.10 (and below) outweighs the cost of
>> potentially needing to revive branch-2.
>>
>> Jonathan Hung
>>
>>
>> On Wed, Nov 20, 2019 at 10:50 AM Eric Yang  wrote:
>>
>>> +1 for 2.10.x as last release for 2.x version.
>>>
>>> Software would become more compatible when more companies stress test
>>> the same software and making improvements in trunk.  Some may be extra
>>> caution on moving up the version because obligation internally to keep
>>> things running.  Company obligation should not be the driving force to
>>> maintain Hadoop branches.  There is no proper collaboration in the
>>> community when every name brand company maintains its own Hadoop 2.x
>>> version.  I think it would be more healthy for the community to reduce the
>>> branch forking and spend energy on trunk to harden the software.  This will
>>> give more confidence to move up the version than trying to fix n
>>> permutations breakage like Flash fixing the timeline.
>>>
>>> Apache license stated, there is no warranty of any kind for code
>>> contributions.  Fewer community release process should improve software
>>> quality when eyes are on trunk, and help steering toward the same end goals.
>>>
>>> regards,
>>> Eric
>>>
>>>
>>>
>>> On Tue, Nov 19, 2019 at 3:03 PM Eric Badger
>>>  wrote:
>>>
>>>> Hello all,
>>>>
>>>> Is it written anywhere what the difference is between a minor release
>>>> and a
>>>> point/dot/maintenance (I'll use "point" from here on out) release? I
>>>> have
>>>> looked around and I can't find anything other than some compatibility
>>>> documentation in 2.x that has since been removed in 3.x [1] [2]. I think
>>>> this would help shape my opinion on whether or not to keep branch-2
>>>> alive.
>>>> My current understanding is that we can't really break compatibility in
>>>> either a minor or point release. But the only mention of the difference
>>>> between minor and point releases is how to deal with Stable, Evolving,
>>>> and
>>>> Unstable tags, and how to deal with changing default configuration
>>>> values.
>>>> So it seems like there really isn't a big official difference between
>>>> the
>>>> two. In my mind, the functional difference between the two is that the
>>>> minor releases may have added features and rewrites, while the point
>>>> releases only have bug fixes. This might be an incorrect understanding,
>>>> but
>>>> that's what I have gathered from watching the releases over the last few
>>>> years. Whether or not this is a correct understanding, I think that this
>>>> needs to be documented somewhere, even if it is just a convent

[jira] [Created] (YARN-10012) Guaranteed and max capacity queue metrics for custom resources

2019-12-03 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-10012:


 Summary: Guaranteed and max capacity queue metrics for custom 
resources
 Key: YARN-10012
 URL: https://issues.apache.org/jira/browse/YARN-10012
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung


YARN-9085 adds support for guaranteed/maxcapacity MB/vcores. We should add the 
same for custom resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9992) Max allocation per queue is zero for custom resource types on RM startup

2019-11-26 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-9992:
---

 Summary: Max allocation per queue is zero for custom resource 
types on RM startup
 Key: YARN-9992
 URL: https://issues.apache.org/jira/browse/YARN-9992
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Hung


Found an issue where trying to request GPUs on a newly booted RM cannot 
schedule. It throws the exception in 
SchedulerUtils#throwInvalidResourceException:
{noformat}
throw new InvalidResourceRequestException(
"Invalid resource request, requested resource type=[" + reqResourceName
+ "] < 0 or greater than maximum allowed allocation. Requested "
+ "resource=" + reqResource + ", maximum allowed allocation="
+ availableResource
+ ", please note that maximum allowed allocation is calculated "
+ "by scheduler based on maximum resource of registered "
+ "NodeManagers, which might be less than configured "
+ "maximum allocation="
+ ResourceUtils.getResourceTypesMaximumAllocation());{noformat}
Upon refreshing scheduler (e.g. via refreshQueues), GPU scheduling works again.

I think the RC is that upon scheduler refresh, resource-types.xml is loaded in 
CapacitySchedulerConfiguration (as part of YARN-7738), so when we call 
ResourceUtils#fetchMaximumAllocationFromConfig in 
CapacitySchedulerConfiguration#getMaximumAllocationPerQueue, it's able to fetch 
the {{yarn.resource-types}} config. But resource-types.xml is not loaded into 
the conf in CapacityScheduler#initScheduler, so it doesn't find the custom 
resource when computing max allocations, and the custom resource max allocation 
is 0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [DISCUSS] Making 2.10 the last minor 2.x release

2019-11-21 Thread Jonathan Hung
Thanks for the detailed thoughts, everyone.

Eric (Badger), my understanding is the same as yours re. minor vs patch
releases. As for putting features into minor/patch releases, if we keep the
convention of putting new features only into minor releases, my assumption
is still that it's unlikely people will want to get them into branch-2
(based on the 2.10.0 release process). For the java 11 issue, we haven't
even really removed support for java 7 in branch-2 (much less java 8), so I
feel moving to java 11 would go along with a move to branch 3. And as you
mentioned, if people really want to use java 11 on branch-2, we can always
revive branch-2. But for now I think the convenience of not needing to port
to both branch-2 and branch-2.10 (and below) outweighs the cost of
potentially needing to revive branch-2.

Jonathan Hung


On Wed, Nov 20, 2019 at 10:50 AM Eric Yang  wrote:

> +1 for 2.10.x as last release for 2.x version.
>
> Software would become more compatible when more companies stress test the
> same software and making improvements in trunk.  Some may be extra caution
> on moving up the version because obligation internally to keep things
> running.  Company obligation should not be the driving force to maintain
> Hadoop branches.  There is no proper collaboration in the community when
> every name brand company maintains its own Hadoop 2.x version.  I think it
> would be more healthy for the community to reduce the branch forking and
> spend energy on trunk to harden the software.  This will give more
> confidence to move up the version than trying to fix n permutations
> breakage like Flash fixing the timeline.
>
> Apache license stated, there is no warranty of any kind for code
> contributions.  Fewer community release process should improve software
> quality when eyes are on trunk, and help steering toward the same end goals.
>
> regards,
> Eric
>
>
>
> On Tue, Nov 19, 2019 at 3:03 PM Eric Badger
>  wrote:
>
>> Hello all,
>>
>> Is it written anywhere what the difference is between a minor release and
>> a
>> point/dot/maintenance (I'll use "point" from here on out) release? I have
>> looked around and I can't find anything other than some compatibility
>> documentation in 2.x that has since been removed in 3.x [1] [2]. I think
>> this would help shape my opinion on whether or not to keep branch-2 alive.
>> My current understanding is that we can't really break compatibility in
>> either a minor or point release. But the only mention of the difference
>> between minor and point releases is how to deal with Stable, Evolving, and
>> Unstable tags, and how to deal with changing default configuration values.
>> So it seems like there really isn't a big official difference between the
>> two. In my mind, the functional difference between the two is that the
>> minor releases may have added features and rewrites, while the point
>> releases only have bug fixes. This might be an incorrect understanding,
>> but
>> that's what I have gathered from watching the releases over the last few
>> years. Whether or not this is a correct understanding, I think that this
>> needs to be documented somewhere, even if it is just a convention.
>>
>> Given my assumed understanding of minor vs point releases, here are the
>> pros/cons that I can think of for having a branch-2. Please add on or
>> correct me for anything you feel is missing or inadequate.
>> Pros:
>> - Features/rewrites/higher-risk patches are less likely to be put into
>> 2.10.x
>> - It is less necessary to move to 3.x
>>
>> Cons:
>> - Bug fixes are less likely to be put into 2.10.x
>> - An extra branch to maintain
>>   - Committers have an extra branch (5 vs 4 total branches) to commit
>> patches to if they should go all the way back to 2.10.x
>> - It is less necessary to move to 3.x
>>
>> So on the one hand you get added stability in fewer features being
>> committed to 2.10.x, but then on the other you get fewer bug fixes being
>> committed. In a perfect world, we wouldn't have to make this tradeoff. But
>> we don't live in a perfect world and committers will make mistakes either
>> because of lack of knowledge or simply because they made a mistake. If we
>> have a branch-2, committers will forget, not know to, or choose not to
>> (for
>> whatever reason) commit valid bug fixes back all the way to branch-2.10.
>> If
>> we don't have a branch-2, committers who want their borderline risky
>> feature in the 2.x line will err on the side of putting it into
>> branch-2.10
>> instead of proposing the creation of a branch-2. Cle

Re: [DISCUSS] Making 2.10 the last minor 2.x release

2019-11-18 Thread Jonathan Hung
Thanks Eric for the comments - regarding your concerns, I feel the pros
outweigh the cons. To me, the chances of patch releases on 2.10.x are much
higher than a new 2.11 minor release. (There didn't seem to be many people
outside of our company who expressed interest in getting new features to
branch-2 prior to the 2.10.0 release.) Even now, a few weeks after 2.10.0
release, there's 29 patches that have gone into branch-2 and 9 in
branch-2.10, so it's already diverged quite a bit.

In any case, we can always reverse this decision if we really need to, by
recreating branch-2. But this proposal would reduce a lot of confusion IMO.

Jonathan Hung


On Fri, Nov 15, 2019 at 11:41 AM epa...@apache.org 
wrote:

> Thanks Jonathan for opening the discussion.
>
> I am not in favor of this proposal. 2.10 was very recently released, and
> moving to 2.10 will take some time for the community. It seems premature to
> make a decision at this point that there will never be a need for a 2.11
> release.
>
> -Eric
>
>
>  On Thursday, November 14, 2019, 8:51:59 PM CST, Jonathan Hung <
> jyhung2...@gmail.com> wrote:
>
> Hi folks,
>
> Given the release of 2.10.0, and the fact that it's intended to be a bridge
> release to Hadoop 3.x [1], I'm proposing we make 2.10.x the last minor
> release line in branch-2. Currently, the main issue is that there's many
> fixes going into branch-2 (the theoretical 2.11.0) that's not going into
> branch-2.10 (which will become 2.10.1), so the fixes in branch-2 will
> likely never see the light of day unless they are backported to
> branch-2.10.
>
> To do this, I propose we:
>
>   - Delete branch-2.10
>   - Rename branch-2 to branch-2.10
>   - Set version in the new branch-2.10 to 2.10.1-SNAPSHOT
>
> This way we get all the current branch-2 fixes into the 2.10.x release
> line. Then the commit chain will look like: trunk -> branch-3.2 ->
> branch-3.1 -> branch-2.10 -> branch-2.9 -> branch-2.8
>
> Thoughts?
>
> Jonathan Hung
>
> [1] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg29479.html
>


Re: [DISCUSS] Making 2.10 the last minor 2.x release

2019-11-14 Thread Jonathan Hung
Some other additional items we would need:

   - Mark all fix-versions in YARN/HDFS/MAPREDUCE/HADOOP from 2.11.0 to
   2.10.1
   - Remove 2.11.0 as a version in these projects


Jonathan Hung


On Thu, Nov 14, 2019 at 6:51 PM Jonathan Hung  wrote:

> Hi folks,
>
> Given the release of 2.10.0, and the fact that it's intended to be a
> bridge release to Hadoop 3.x [1], I'm proposing we make 2.10.x the last
> minor release line in branch-2. Currently, the main issue is that there's
> many fixes going into branch-2 (the theoretical 2.11.0) that's not going
> into branch-2.10 (which will become 2.10.1), so the fixes in branch-2 will
> likely never see the light of day unless they are backported to branch-2.10.
>
> To do this, I propose we:
>
>- Delete branch-2.10
>- Rename branch-2 to branch-2.10
>- Set version in the new branch-2.10 to 2.10.1-SNAPSHOT
>
> This way we get all the current branch-2 fixes into the 2.10.x release
> line. Then the commit chain will look like: trunk -> branch-3.2 ->
> branch-3.1 -> branch-2.10 -> branch-2.9 -> branch-2.8
>
> Thoughts?
>
> Jonathan Hung
>
> [1] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg29479.html
>


[DISCUSS] Making 2.10 the last minor 2.x release

2019-11-14 Thread Jonathan Hung
Hi folks,

Given the release of 2.10.0, and the fact that it's intended to be a bridge
release to Hadoop 3.x [1], I'm proposing we make 2.10.x the last minor
release line in branch-2. Currently, the main issue is that there's many
fixes going into branch-2 (the theoretical 2.11.0) that's not going into
branch-2.10 (which will become 2.10.1), so the fixes in branch-2 will
likely never see the light of day unless they are backported to branch-2.10.

To do this, I propose we:

   - Delete branch-2.10
   - Rename branch-2 to branch-2.10
   - Set version in the new branch-2.10 to 2.10.1-SNAPSHOT

This way we get all the current branch-2 fixes into the 2.10.x release
line. Then the commit chain will look like: trunk -> branch-3.2 ->
branch-3.1 -> branch-2.10 -> branch-2.9 -> branch-2.8

Thoughts?

Jonathan Hung

[1] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg29479.html


[jira] [Created] (YARN-9964) Queue metrics turn negative when relabeling a node with running containers to default partition

2019-11-07 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-9964:
---

 Summary: Queue metrics turn negative when relabeling a node with 
running containers to default partition 
 Key: YARN-9964
 URL: https://issues.apache.org/jira/browse/YARN-9964
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Hung


YARN-6467 changed queue metrics logic to only update certain metrics if it's 
for default partition. But if an app runs containers in a labeled node, then 
this node is moved to default partition, then the container is released, this 
container's resource won't increment queue's allocated resource, but will 
decrement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9954) Configurable max application tags and max tag length

2019-11-05 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-9954:
---

 Summary: Configurable max application tags and max tag length
 Key: YARN-9954
 URL: https://issues.apache.org/jira/browse/YARN-9954
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung


Currently max tags and max tag length is hardcoded, it should be configurable
{noformat}
@Evolving
public static final int APPLICATION_MAX_TAGS = 10;

@Evolving
public static final int APPLICATION_MAX_TAG_LENGTH = 100; {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[ANNOUNCE] Apache Hadoop 2.10.0 release

2019-10-31 Thread Jonathan Hung
Hi all,

I am happy to announce that the Apache Hadoop 2.10.0 has been released.

Apache Hadoop 2.10.0 is the first release in the Apache Hadoop 2.10 line.
The release details, including links to downloads, list of major features,
release notes, and changelog, are on the 2.10.0 announcement page [1]. You
can also download the release from the Downloads page [2].

- Major features: https://hadoop.apache.org/docs/r2.10.0/index.html
- Release notes:
http://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/release/2.10.0/RELEASENOTES.2.10.0.html
- Changelog:
http://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/release/2.10.0/CHANGES.2.10.0.html

Thanks!

[1] https://hadoop.apache.org/release/2.10.0.html
[2] https://hadoop.apache.org/releases.html

Jonathan


[jira] [Created] (YARN-9945) Fix javadoc in FederationProxyProviderUtil in branch-2

2019-10-31 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-9945:
---

 Summary: Fix javadoc in FederationProxyProviderUtil in branch-2
 Key: YARN-9945
 URL: https://issues.apache.org/jira/browse/YARN-9945
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Hung
Assignee: Jonathan Hung


{noformat}
[ERROR] 
/home/jhung/hadoop-mp/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/failover/FederationProxyProviderUtil.java:83:
 error: reference not found
[ERROR] * @param configuration Configuration to generate {@link ClientRMProxy} 
{noformat}
This import was removed in branch-2 but it's referenced in this file's javadocs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.10.0 (RC1)

2019-10-29 Thread Jonathan Hung
+1 from me too. The vote passed, so I'll continue with the rest of the
release.

Thanks everyone!

Jonathan Hung


On Tue, Oct 29, 2019 at 1:40 PM Giovanni Matteo Fumarola <
giovanni.fumar...@gmail.com> wrote:

> +1 (non-binding).
>
> - Built from source on Ubuntu with OpenJDK 11.0.3
> - Verified signatures
> - Verified documentation
> - Setup up a single node cluster and ran basic yarn commands
> - Ran UTs for Yarn Router, Yarn Common, Yarn API, YARN NM and YARN RM.
>
> Thanks for putting this together, Jonathan.
>
> On Tue, Oct 29, 2019 at 8:47 AM Dinesh Chitlangia
>  wrote:
>
>> +1 (non-binding)
>>
>> - Verified signatures
>> - Verified documentation
>> - Built from sources on CentOS 7
>> - Tested with basic hdfs commands on a single node setup.
>>
>> Thank for organizing the release, Jonathan.
>>
>> -Dinesh
>>
>>
>>
>> On Tue, Oct 29, 2019 at 9:45 AM epa...@apache.org 
>> wrote:
>>
>> > Compatibility testing has gone well for me.
>> >
>> >  - In a 4-node cluster, I ran YARN rolling upgrade tests between 2.8.5
>> and
>> > 2.10.0
>> > - In a 4-node cluster, I ran YARN rolling upgrade tests between 2.10.0
>> and
>> > trunk
>> > - With one 4-node cluster running 2.10.0 and one 4-node cluster running
>> > trunk, I ran a word count job in each cluster whose inputs and outputs
>> were
>> > from and to the opposite cluster.
>> > - I verified that HDFS replication works as expected in a trunk cluster
>> > that has one 2.10.0 datanode.
>> >
>> >  Thanks,
>> > -Eric
>> >
>> >
>> > > On Tuesday, October 22, 2019, 4:55:29 PM CDT, Jonathan Hung <
>> > jyhung2...@gmail.com> wrote:
>> > > Hi folks,
>> > >
>> > >This is the second release candidate for the first release of Apache
>> > Hadoop
>> > >2.10 line. It contains 362 fixes/improvements since 2.9 [1]. It
>> includes
>> > >features such as:
>> > >
>> > > - User-defined resource types
>> > > - Native GPU support as a schedulable resource type
>> > > - Consistent reads from standby node
>> > > - Namenode port based selective encryption
>> > > - Improvements related to rolling upgrade support from 2.x to 3.x
>> > > - Cost based fair call queue
>> > >
>> > > The RC1 artifacts are at:
>> > http://home.apache.org/~jhung/hadoop-2.10.0-RC1/
>> > >
>> > > RC tag is release-2.10.0-RC1.
>> > >
>> > > The maven artifacts are hosted here:
>> > >
>> https://repository.apache.org/content/repositories/orgapachehadoop-1243/
>> > >
>> > > My public key is available here:
>> > > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>> > >
>> > > The vote will run for 5 weekdays, until Tuesday, October 29 at 3:00 pm
>> > PDT.
>> > >
>> > > Thanks,
>> > > Jonathan Hung
>> >
>> >
>> > -
>> > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
>> > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>> >
>> >
>>
>


Re: [VOTE] Release Apache Hadoop 2.10.0 (RC0)

2019-10-28 Thread Jonathan Hung
Thanks Eric! I sent out an RC1 earlier last week, not sure if you saw that.
The only diff between RC1 and RC0 is HDFS-14667. If RC1 looks good to you
then it'd be great to get your testing results on that thread.

Jonathan Hung


On Mon, Oct 28, 2019 at 1:06 PM epa...@apache.org  wrote:

> Compatibility testing has gone well for me.
>
> - In a 4-node cluster, I ran YARN rolling upgrade tests between 2.8.5 and
> 2.10.0
> - In a 4-node cluster, I ran YARN rolling upgrade tests between 2.10.0 and
> trunk
> - With one 4-node cluster running 2.10.0 and one 4-node cluster running
> trunk, I ran a word count job in each cluster whose inputs and outputs were
> from and to the opposite cluster.
> - I verified that HDFS replication works as expected in a trunk cluster
> that has one 2.10.0 datanode.
>
> Thanks,
> -Eric
>
> On Tuesday, October 22, 2019, 8:39:38 PM CDT, Jonathan Hung <
> jyhung2...@gmail.com> wrote:
>
>
>
>
>
> Hi Eric, we've run some basic HDFS commands with a 3.2.1 namenode and
> 2.10.0 clients and datanodes. Everything worked as expected.
>
> Jonathan Hung
>
>
> On Tue, Oct 22, 2019 at 3:04 PM Eric Badger 
> wrote:
>
> > Hi Jonathan,
> >
> > Thanks for putting this RC together. You stated that there are
> > improvements related to rolling upgrades from 2.x to 3.x and I know I
> have
> > seen multiple JIRAs getting committed to that effect. Could you describe
> > any tests that you have done to verify rolling upgrade compatibility
> > for 3.x servers talking to 2.x clients and vice versa?
> >
> > Thanks,
> >
> > Eric
> >
> > On Tue, Oct 22, 2019 at 1:49 PM Jonathan Hung 
> > wrote:
> >
> >> Thanks Konstantin and Zhankun. Unfortunately a feature slipped our radar
> >> (HDFS-14667). Since this is the first of a minor release, we would like
> to
> >> get it into 2.10.0.
> >>
> >> HDFS-14667 has been committed to branch-2.10.0, I will be rolling an RC1
> >> shortly.
> >>
> >> Jonathan Hung
> >>
> >>
> >> On Tue, Oct 22, 2019 at 1:39 AM Zhankun Tang  wrote:
> >>
> >> > Thanks for the effort, Jonathan!
> >> >
> >> > +1 (non-binding) on RC0.
> >> >  - Set up a single node cluster with the binary tarball
> >> >  - Run Spark Pi and pySpark job
> >> >
> >> > BR,
> >> > Zhankun
> >> >
> >> > On Tue, 22 Oct 2019 at 14:31, Konstantin Shvachko <
> shv.had...@gmail.com
> >> >
> >> > wrote:
> >> >
> >> >> +1 on RC0.
> >> >> - Verified signatures
> >> >> - Built from sources
> >> >> - Ran unit tests for new features
> >> >> - Checked artifacts on Nexus, made sure the sources are present.
> >> >>
> >> >> Thanks
> >> >> --Konstantin
> >> >>
> >> >>
> >> >> On Wed, Oct 16, 2019 at 6:01 PM Jonathan Hung 
> >> >> wrote:
> >> >>
> >> >> > Hi folks,
> >> >> >
> >> >> > This is the first release candidate for the first release of Apache
> >> >> Hadoop
> >> >> > 2.10 line. It contains 361 fixes/improvements since 2.9 [1]. It
> >> includes
> >> >> > features such as:
> >> >> >
> >> >> > - User-defined resource types
> >> >> > - Native GPU support as a schedulable resource type
> >> >> > - Consistent reads from standby node
> >> >> > - Namenode port based selective encryption
> >> >> > - Improvements related to rolling upgrade support from 2.x to 3.x
> >> >> >
> >> >> > The RC0 artifacts are at:
> >> >> http://home.apache.org/~jhung/hadoop-2.10.0-RC0/
> >> >> >
> >> >> > RC tag is release-2.10.0-RC0.
> >> >> >
> >> >> > The maven artifacts are hosted here:
> >> >> >
> >> >>
> >>
> https://repository.apache.org/content/repositories/orgapachehadoop-1241/
> >> >> >
> >> >> > My public key is available here:
> >> >> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> >> >> >
> >> >> > The vote will run for 5 weekdays, until Wednesday, October 23 at
> >> 6:00 pm
> >> >> > PDT.
> >> >> >
> >> >> > Thanks,
> >> >> > Jonathan Hung
> >> >> >
> >> >> > [1]
> >> >> >
> >> >> >
> >> >>
> >>
> https://issues.apache.org/jira/issues/?jql=project%20in%20(HDFS%2C%20YARN%2C%20HADOOP%2C%20MAPREDUCE)%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%202.10.0%20AND%20fixVersion%20not%20in%20(2.9.2%2C%202.9.1%2C%202.9.0)
> >> >> >
> >> >>
> >> >
> >>
> >
>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>


Re: [VOTE] Release Apache Hadoop 2.10.0 (RC0)

2019-10-26 Thread Jonathan Hung
Hi Eric, I took a quick look, are you using
mapreduce.application.framework.path to run your MR jobs? If not, this
seems like expected behavior if AM and tasks get launched on different NMs
with different locally installed hadoop versions?

Jonathan Hung


On Sat, Oct 26, 2019 at 8:55 AM epa...@apache.org  wrote:

> I ran a few compatibility tests between 2.10.0 and 3.3.0 (trunk)
>
> Unfortunately, I ran into the following problem:
>
> Running with 2.10 RM and 3.3.0 (trunk) NM fails attempts with the
> following error:
>
> 2019-10-26 15:44:06,885 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child :
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RPC$VersionMismatch):
> Protocol org.apache.hadoop.mapred.TaskUmbilicalProtocol version mismatch.
> (client = 19, server = 21)
>
> The AM happened to launch on the 3.3.0 node.
>
> Is this a protobuf issue? I thought we addressed that?
>
> -Eric Payne
>
>
>
> On Tuesday, October 22, 2019, 8:39:38 PM CDT, Jonathan Hung <
> jyhung2...@gmail.com> wrote:
>
>
>
>
>
> Hi Eric, we've run some basic HDFS commands with a 3.2.1 namenode and
> 2.10.0 clients and datanodes. Everything worked as expected.
>
> Jonathan Hung
>
>
> On Tue, Oct 22, 2019 at 3:04 PM Eric Badger 
> wrote:
>
> > Hi Jonathan,
> >
> > Thanks for putting this RC together. You stated that there are
> > improvements related to rolling upgrades from 2.x to 3.x and I know I
> have
> > seen multiple JIRAs getting committed to that effect. Could you describe
> > any tests that you have done to verify rolling upgrade compatibility
> > for 3.x servers talking to 2.x clients and vice versa?
> >
> > Thanks,
> >
> > Eric
> >
> > On Tue, Oct 22, 2019 at 1:49 PM Jonathan Hung 
> > wrote:
> >
> >> Thanks Konstantin and Zhankun. Unfortunately a feature slipped our radar
> >> (HDFS-14667). Since this is the first of a minor release, we would like
> to
> >> get it into 2.10.0.
> >>
> >> HDFS-14667 has been committed to branch-2.10.0, I will be rolling an RC1
> >> shortly.
> >>
> >> Jonathan Hung
> >>
> >>
> >> On Tue, Oct 22, 2019 at 1:39 AM Zhankun Tang  wrote:
> >>
> >> > Thanks for the effort, Jonathan!
> >> >
> >> > +1 (non-binding) on RC0.
> >> >  - Set up a single node cluster with the binary tarball
> >> >  - Run Spark Pi and pySpark job
> >> >
> >> > BR,
> >> > Zhankun
> >> >
> >> > On Tue, 22 Oct 2019 at 14:31, Konstantin Shvachko <
> shv.had...@gmail.com
> >> >
> >> > wrote:
> >> >
> >> >> +1 on RC0.
> >> >> - Verified signatures
> >> >> - Built from sources
> >> >> - Ran unit tests for new features
> >> >> - Checked artifacts on Nexus, made sure the sources are present.
> >> >>
> >> >> Thanks
> >> >> --Konstantin
> >> >>
> >> >>
> >> >> On Wed, Oct 16, 2019 at 6:01 PM Jonathan Hung 
> >> >> wrote:
> >> >>
> >> >> > Hi folks,
> >> >> >
> >> >> > This is the first release candidate for the first release of Apache
> >> >> Hadoop
> >> >> > 2.10 line. It contains 361 fixes/improvements since 2.9 [1]. It
> >> includes
> >> >> > features such as:
> >> >> >
> >> >> > - User-defined resource types
> >> >> > - Native GPU support as a schedulable resource type
> >> >> > - Consistent reads from standby node
> >> >> > - Namenode port based selective encryption
> >> >> > - Improvements related to rolling upgrade support from 2.x to 3.x
> >> >> >
> >> >> > The RC0 artifacts are at:
> >> >> http://home.apache.org/~jhung/hadoop-2.10.0-RC0/
> >> >> >
> >> >> > RC tag is release-2.10.0-RC0.
> >> >> >
> >> >> > The maven artifacts are hosted here:
> >> >> >
> >> >>
> >>
> https://repository.apache.org/content/repositories/orgapachehadoop-1241/
> >> >> >
> >> >> > My public key is available here:
> >> >> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> >> >> >
> >> >> > The vote will run for 5 weekdays, until Wednesday, October 23 at
> >> 6:00 pm
> >> >> > PDT.
> >> >> >
> >> >> > Thanks,
> >> >> > Jonathan Hung
> >> >> >
> >> >> > [1]
> >> >> >
> >> >> >
> >> >>
> >>
> https://issues.apache.org/jira/issues/?jql=project%20in%20(HDFS%2C%20YARN%2C%20HADOOP%2C%20MAPREDUCE)%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%202.10.0%20AND%20fixVersion%20not%20in%20(2.9.2%2C%202.9.1%2C%202.9.0)
> >> >> >
> >> >>
> >> >
> >>
> >
>


Re: [VOTE] Release Apache Hadoop 2.10.0 (RC1)

2019-10-25 Thread Jonathan Hung
Some more thoughts: for the javadoc issue, I think we can just support
building on java 7.

For the release notes issue, I can work with the authors of the major
features to come up with release notes and update them before pushing it to
site. The release notes in the published artifacts won't be up to date, but
I think that's fine.

I'll go ahead with this plan if no objections.

Jonathan Hung


On Fri, Oct 25, 2019 at 12:19 PM Jonathan Hung  wrote:

> Thanks for looking Erik.
>
> For the release notes, yeah I think it's because there's no release notes
> for the corresponding JIRAs. I've added details for these features to the
> index.md.vm file which should show up on the homepage for 2.10.0 (e.g.
> https://hadoop.apache.org/docs/r2.9.0/index.html). We could add release
> notes for these JIRAs, but that would require recreating the tar.gzs since
> the release notes are bundled in there.
>
> For the javadoc issue, I was able to repro this issue, seems it's because
> the org.apache.hadoop.yarn.client.ClientRMProxy import was removed in
> FederationProxyProviderUtil in YARN-7900 in branch-2 (but not in other
> branches). But it's referenced in javadocs in this file so it throws this
> error. Re-adding this import and building with java 8 allows it to succeed.
>
> I checked javadoc html for FederationProxyProviderUtil in the produced
> artifacts and it appears to be correct.
>
> I think we could easily overwrite the current RC1 artifacts with ones
> containing proper release notes. Not sure what to do about the javadoc
> issue though, that would require overwriting the release-2.10.0-RC1 tag
> which I don't want to do. What do others think?
>
> Jonathan Hung
>
>
> On Fri, Oct 25, 2019 at 9:21 AM Erik Krogen  wrote:
>
>> Thanks for putting this together, Jonathan!
>>
>> I noticed that the RELEASENOTES.md makes no mention of any of the major
>> features you mentioned in your email about the RC. Is this expected? I
>> guess it is caused by the lack of a release note on the JIRAs for those
>> features.
>>
>> I also noticed that building a distribution package (mvn -DskipTests
>> package -Pdist) fails on Java 8 (1.8.0_172) with a bunch of Javadoc errors.
>> It works fine on Java 7. Is this expected?
>>
>> Other verifications I performed:
>>
>>- Verified all signatures in RC1
>>- Verified all checksums in RC1
>>- Visually inspected contents of src tarball
>>- Built from source on Mac OSX 10.14.6 and RHEL7 (Java 8)
>>    - mvn -DskipTests package
>>- Visually inspected contents of binary tarball
>>
>> Thanks,
>> Erik
>>
>> --
>> *From:* Konstantin Shvachko 
>> *Sent:* Wednesday, October 23, 2019 6:10 PM
>> *To:* Jonathan Hung 
>> *Cc:* Hdfs-dev ; mapreduce-dev <
>> mapreduce-...@hadoop.apache.org>; yarn-dev ;
>> Hadoop Common 
>> *Subject:* Re: [VOTE] Release Apache Hadoop 2.10.0 (RC1)
>>
>> +1 on RC1
>>
>> - Verified signatures
>> - Verified maven artifacts on Nexus for sources
>> - Checked rat reports
>> - Checked documentation
>> - Checked packaging contents
>> - Built from sources on RHEL 7 box
>> - Ran unit tests for new HDFS features with Java 8
>>
>> Thanks,
>> --Konstantin
>>
>> On Tue, Oct 22, 2019 at 2:55 PM Jonathan Hung 
>> wrote:
>>
>> > Hi folks,
>> >
>> > This is the second release candidate for the first release of Apache
>> Hadoop
>> > 2.10 line. It contains 362 fixes/improvements since 2.9 [1]. It includes
>> > features such as:
>> >
>> > - User-defined resource types
>> > - Native GPU support as a schedulable resource type
>> > - Consistent reads from standby node
>> > - Namenode port based selective encryption
>> > - Improvements related to rolling upgrade support from 2.x to 3.x
>> > - Cost based fair call queue
>> >
>> > The RC1 artifacts are at:
>> https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fhome.apache.org%2F~jhung%2Fhadoop-2.10.0-RC1%2F&data=02%7C01%7Cekrogen%40linkedin.com%7C1fee1e5911d8415a418b08d7581f0c7e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C637074762694349124&sdata=ZX7lF4N3fV38ggkplLU56ybhKBZrx%2FUKMkfxm2WJ7eU%3D&reserved=0
>> >
>> > RC tag is release-2.10.0-RC1.
>> >
>> > The maven artifacts are hosted here:
>> >
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachehadoop-1243%2F&data=02%7C01%7Cekrogen%40linkedin.com%7C

Re: [VOTE] Release Apache Hadoop 2.10.0 (RC1)

2019-10-25 Thread Jonathan Hung
Thanks for looking Erik.

For the release notes, yeah I think it's because there's no release notes
for the corresponding JIRAs. I've added details for these features to the
index.md.vm file which should show up on the homepage for 2.10.0 (e.g.
https://hadoop.apache.org/docs/r2.9.0/index.html). We could add release
notes for these JIRAs, but that would require recreating the tar.gzs since
the release notes are bundled in there.

For the javadoc issue, I was able to repro this issue, seems it's because
the org.apache.hadoop.yarn.client.ClientRMProxy import was removed in
FederationProxyProviderUtil in YARN-7900 in branch-2 (but not in other
branches). But it's referenced in javadocs in this file so it throws this
error. Re-adding this import and building with java 8 allows it to succeed.

I checked javadoc html for FederationProxyProviderUtil in the produced
artifacts and it appears to be correct.

I think we could easily overwrite the current RC1 artifacts with ones
containing proper release notes. Not sure what to do about the javadoc
issue though, that would require overwriting the release-2.10.0-RC1 tag
which I don't want to do. What do others think?

Jonathan Hung


On Fri, Oct 25, 2019 at 9:21 AM Erik Krogen  wrote:

> Thanks for putting this together, Jonathan!
>
> I noticed that the RELEASENOTES.md makes no mention of any of the major
> features you mentioned in your email about the RC. Is this expected? I
> guess it is caused by the lack of a release note on the JIRAs for those
> features.
>
> I also noticed that building a distribution package (mvn -DskipTests
> package -Pdist) fails on Java 8 (1.8.0_172) with a bunch of Javadoc errors.
> It works fine on Java 7. Is this expected?
>
> Other verifications I performed:
>
>- Verified all signatures in RC1
>- Verified all checksums in RC1
>- Visually inspected contents of src tarball
>- Built from source on Mac OSX 10.14.6 and RHEL7 (Java 8)
>- mvn -DskipTests package
>- Visually inspected contents of binary tarball
>
> Thanks,
> Erik
>
> ------
> *From:* Konstantin Shvachko 
> *Sent:* Wednesday, October 23, 2019 6:10 PM
> *To:* Jonathan Hung 
> *Cc:* Hdfs-dev ; mapreduce-dev <
> mapreduce-...@hadoop.apache.org>; yarn-dev ;
> Hadoop Common 
> *Subject:* Re: [VOTE] Release Apache Hadoop 2.10.0 (RC1)
>
> +1 on RC1
>
> - Verified signatures
> - Verified maven artifacts on Nexus for sources
> - Checked rat reports
> - Checked documentation
> - Checked packaging contents
> - Built from sources on RHEL 7 box
> - Ran unit tests for new HDFS features with Java 8
>
> Thanks,
> --Konstantin
>
> On Tue, Oct 22, 2019 at 2:55 PM Jonathan Hung 
> wrote:
>
> > Hi folks,
> >
> > This is the second release candidate for the first release of Apache
> Hadoop
> > 2.10 line. It contains 362 fixes/improvements since 2.9 [1]. It includes
> > features such as:
> >
> > - User-defined resource types
> > - Native GPU support as a schedulable resource type
> > - Consistent reads from standby node
> > - Namenode port based selective encryption
> > - Improvements related to rolling upgrade support from 2.x to 3.x
> > - Cost based fair call queue
> >
> > The RC1 artifacts are at:
> https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fhome.apache.org%2F~jhung%2Fhadoop-2.10.0-RC1%2F&data=02%7C01%7Cekrogen%40linkedin.com%7C1fee1e5911d8415a418b08d7581f0c7e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C637074762694349124&sdata=ZX7lF4N3fV38ggkplLU56ybhKBZrx%2FUKMkfxm2WJ7eU%3D&reserved=0
> >
> > RC tag is release-2.10.0-RC1.
> >
> > The maven artifacts are hosted here:
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachehadoop-1243%2F&data=02%7C01%7Cekrogen%40linkedin.com%7C1fee1e5911d8415a418b08d7581f0c7e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C637074762694349124&sdata=DsJDfoj8eg3E%2F%2BNEwOAI41LhcRJ2hOWycS923ds3Seg%3D&reserved=0
> >
> > My public key is available here:
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Frelease%2Fhadoop%2Fcommon%2FKEYS&data=02%7C01%7Cekrogen%40linkedin.com%7C1fee1e5911d8415a418b08d7581f0c7e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C637074762694349124&sdata=1694z6xhj5NtxwYBpwnRBx%2BgK0npGIUs5O580K3KPJw%3D&reserved=0
> >
> > The vote will run for 5 weekdays, until Tuesday, October 29 at 3:00 pm
> PDT.
> >
> > Thanks,
> > Jonathan Hung
> >
> > [1]
> >
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.a

Re: [VOTE] Release Apache Hadoop 2.10.0 (RC1)

2019-10-23 Thread Jonathan Hung
Hi Eric, thanks for trying it out. We talked about this in today's YARN
community sync up, summarizing here for everyone else:

I don't think it's worth delaying the 2.10.0 release further, we can
address this in a subsequent 2.10.x release. Wangda mentioned it might be
related to changes in dominant resource calculator, but root cause remains
to be seen.

Jonathan Hung


On Wed, Oct 23, 2019 at 9:02 AM epa...@apache.org  wrote:

> Hi Jonathan,
>
> Thanks very much for all of your work on this release.
>
> I have a concern about cross-queue (inter-queue) preemption in 2.10.
>
> In 2.8, on a 6 node pseudo-cluster, preempting from one queue to meet the
> needs of another queue seems to work as expected. However, 2.10 in the same
> pseudo-cluster (with the same config properties), only one container was
> preempted for the AM and then nothing else.
>
> I don't know how the community feels about holding up the 2.10.0 release
> for this issue, but we need to get to the bottom of this before we can go
> to 2.10.x. I am still investigating.
>
> Thanks,
> -Eric
>
>
>
>
>  On Tuesday, October 22, 2019, 4:55:29 PM CDT, Jonathan Hung <
> jyhung2...@gmail.com> wrote:
> > Hi folks,
> >
> > This is the second release candidate for the first release of Apache
> Hadoop
> > 2.10 line. It contains 362 fixes/improvements since 2.9 [1]. It includes
> > features such as:
> >
> > - User-defined resource types
> > - Native GPU support as a schedulable resource type
> > - Consistent reads from standby node
> > - Namenode port based selective encryption
> > - Improvements related to rolling upgrade support from 2.x to 3.x
> > - Cost based fair call queue
> >
> > The RC1 artifacts are at:
> http://home.apache.org/~jhung/hadoop-2.10.0-RC1/
> >
> > RC tag is release-2.10.0-RC1.
> >
> > The maven artifacts are hosted here:
> > https://repository.apache.org/content/repositories/orgapachehadoop-1243/
> >
> > My public key is available here:
> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> >
> > The vote will run for 5 weekdays, until Tuesday, October 29 at 3:00 pm
> PDT.
> >
> > Thanks,
> > Jonathan Hung
>


Re: [VOTE] Release Apache Hadoop 2.10.0 (RC0)

2019-10-22 Thread Jonathan Hung
Hi Eric, we've run some basic HDFS commands with a 3.2.1 namenode and
2.10.0 clients and datanodes. Everything worked as expected.

Jonathan Hung


On Tue, Oct 22, 2019 at 3:04 PM Eric Badger 
wrote:

> Hi Jonathan,
>
> Thanks for putting this RC together. You stated that there are
> improvements related to rolling upgrades from 2.x to 3.x and I know I have
> seen multiple JIRAs getting committed to that effect. Could you describe
> any tests that you have done to verify rolling upgrade compatibility
> for 3.x servers talking to 2.x clients and vice versa?
>
> Thanks,
>
> Eric
>
> On Tue, Oct 22, 2019 at 1:49 PM Jonathan Hung 
> wrote:
>
>> Thanks Konstantin and Zhankun. Unfortunately a feature slipped our radar
>> (HDFS-14667). Since this is the first of a minor release, we would like to
>> get it into 2.10.0.
>>
>> HDFS-14667 has been committed to branch-2.10.0, I will be rolling an RC1
>> shortly.
>>
>> Jonathan Hung
>>
>>
>> On Tue, Oct 22, 2019 at 1:39 AM Zhankun Tang  wrote:
>>
>> > Thanks for the effort, Jonathan!
>> >
>> > +1 (non-binding) on RC0.
>> >  - Set up a single node cluster with the binary tarball
>> >  - Run Spark Pi and pySpark job
>> >
>> > BR,
>> > Zhankun
>> >
>> > On Tue, 22 Oct 2019 at 14:31, Konstantin Shvachko > >
>> > wrote:
>> >
>> >> +1 on RC0.
>> >> - Verified signatures
>> >> - Built from sources
>> >> - Ran unit tests for new features
>> >> - Checked artifacts on Nexus, made sure the sources are present.
>> >>
>> >> Thanks
>> >> --Konstantin
>> >>
>> >>
>> >> On Wed, Oct 16, 2019 at 6:01 PM Jonathan Hung 
>> >> wrote:
>> >>
>> >> > Hi folks,
>> >> >
>> >> > This is the first release candidate for the first release of Apache
>> >> Hadoop
>> >> > 2.10 line. It contains 361 fixes/improvements since 2.9 [1]. It
>> includes
>> >> > features such as:
>> >> >
>> >> > - User-defined resource types
>> >> > - Native GPU support as a schedulable resource type
>> >> > - Consistent reads from standby node
>> >> > - Namenode port based selective encryption
>> >> > - Improvements related to rolling upgrade support from 2.x to 3.x
>> >> >
>> >> > The RC0 artifacts are at:
>> >> http://home.apache.org/~jhung/hadoop-2.10.0-RC0/
>> >> >
>> >> > RC tag is release-2.10.0-RC0.
>> >> >
>> >> > The maven artifacts are hosted here:
>> >> >
>> >>
>> https://repository.apache.org/content/repositories/orgapachehadoop-1241/
>> >> >
>> >> > My public key is available here:
>> >> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>> >> >
>> >> > The vote will run for 5 weekdays, until Wednesday, October 23 at
>> 6:00 pm
>> >> > PDT.
>> >> >
>> >> > Thanks,
>> >> > Jonathan Hung
>> >> >
>> >> > [1]
>> >> >
>> >> >
>> >>
>> https://issues.apache.org/jira/issues/?jql=project%20in%20(HDFS%2C%20YARN%2C%20HADOOP%2C%20MAPREDUCE)%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%202.10.0%20AND%20fixVersion%20not%20in%20(2.9.2%2C%202.9.1%2C%202.9.0)
>> >> >
>> >>
>> >
>>
>


[VOTE] Release Apache Hadoop 2.10.0 (RC1)

2019-10-22 Thread Jonathan Hung
Hi folks,

This is the second release candidate for the first release of Apache Hadoop
2.10 line. It contains 362 fixes/improvements since 2.9 [1]. It includes
features such as:

- User-defined resource types
- Native GPU support as a schedulable resource type
- Consistent reads from standby node
- Namenode port based selective encryption
- Improvements related to rolling upgrade support from 2.x to 3.x
- Cost based fair call queue

The RC1 artifacts are at: http://home.apache.org/~jhung/hadoop-2.10.0-RC1/

RC tag is release-2.10.0-RC1.

The maven artifacts are hosted here:
https://repository.apache.org/content/repositories/orgapachehadoop-1243/

My public key is available here:
https://dist.apache.org/repos/dist/release/hadoop/common/KEYS

The vote will run for 5 weekdays, until Tuesday, October 29 at 3:00 pm PDT.

Thanks,
Jonathan Hung

[1]
https://issues.apache.org/jira/issues/?jql=project%20in%20(HDFS%2C%20YARN%2C%20HADOOP%2C%20MAPREDUCE)%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%202.10.0%20AND%20fixVersion%20not%20in%20(2.9.2%2C%202.9.1%2C%202.9.0)


Re: [VOTE] Release Apache Hadoop 2.10.0 (RC0)

2019-10-22 Thread Jonathan Hung
Thanks Konstantin and Zhankun. Unfortunately a feature slipped our radar
(HDFS-14667). Since this is the first of a minor release, we would like to
get it into 2.10.0.

HDFS-14667 has been committed to branch-2.10.0, I will be rolling an RC1
shortly.

Jonathan Hung


On Tue, Oct 22, 2019 at 1:39 AM Zhankun Tang  wrote:

> Thanks for the effort, Jonathan!
>
> +1 (non-binding) on RC0.
>  - Set up a single node cluster with the binary tarball
>  - Run Spark Pi and pySpark job
>
> BR,
> Zhankun
>
> On Tue, 22 Oct 2019 at 14:31, Konstantin Shvachko 
> wrote:
>
>> +1 on RC0.
>> - Verified signatures
>> - Built from sources
>> - Ran unit tests for new features
>> - Checked artifacts on Nexus, made sure the sources are present.
>>
>> Thanks
>> --Konstantin
>>
>>
>> On Wed, Oct 16, 2019 at 6:01 PM Jonathan Hung 
>> wrote:
>>
>> > Hi folks,
>> >
>> > This is the first release candidate for the first release of Apache
>> Hadoop
>> > 2.10 line. It contains 361 fixes/improvements since 2.9 [1]. It includes
>> > features such as:
>> >
>> > - User-defined resource types
>> > - Native GPU support as a schedulable resource type
>> > - Consistent reads from standby node
>> > - Namenode port based selective encryption
>> > - Improvements related to rolling upgrade support from 2.x to 3.x
>> >
>> > The RC0 artifacts are at:
>> http://home.apache.org/~jhung/hadoop-2.10.0-RC0/
>> >
>> > RC tag is release-2.10.0-RC0.
>> >
>> > The maven artifacts are hosted here:
>> >
>> https://repository.apache.org/content/repositories/orgapachehadoop-1241/
>> >
>> > My public key is available here:
>> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>> >
>> > The vote will run for 5 weekdays, until Wednesday, October 23 at 6:00 pm
>> > PDT.
>> >
>> > Thanks,
>> > Jonathan Hung
>> >
>> > [1]
>> >
>> >
>> https://issues.apache.org/jira/issues/?jql=project%20in%20(HDFS%2C%20YARN%2C%20HADOOP%2C%20MAPREDUCE)%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%202.10.0%20AND%20fixVersion%20not%20in%20(2.9.2%2C%202.9.1%2C%202.9.0)
>> >
>>
>


[VOTE] Release Apache Hadoop 2.10.0 (RC0)

2019-10-16 Thread Jonathan Hung
Hi folks,

This is the first release candidate for the first release of Apache Hadoop
2.10 line. It contains 361 fixes/improvements since 2.9 [1]. It includes
features such as:

- User-defined resource types
- Native GPU support as a schedulable resource type
- Consistent reads from standby node
- Namenode port based selective encryption
- Improvements related to rolling upgrade support from 2.x to 3.x

The RC0 artifacts are at: http://home.apache.org/~jhung/hadoop-2.10.0-RC0/

RC tag is release-2.10.0-RC0.

The maven artifacts are hosted here:
https://repository.apache.org/content/repositories/orgapachehadoop-1241/

My public key is available here:
https://dist.apache.org/repos/dist/release/hadoop/common/KEYS

The vote will run for 5 weekdays, until Wednesday, October 23 at 6:00 pm
PDT.

Thanks,
Jonathan Hung

[1]
https://issues.apache.org/jira/issues/?jql=project%20in%20(HDFS%2C%20YARN%2C%20HADOOP%2C%20MAPREDUCE)%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%202.10.0%20AND%20fixVersion%20not%20in%20(2.9.2%2C%202.9.1%2C%202.9.0)


Re: [DISCUSS] Hadoop 2.10.0 release plan

2019-10-16 Thread Jonathan Hung
I've moved all jiras with target version 2.10.0 to 2.10.1. Also I've
created branch-2.10 and branch-2.10.0, please commit any 2.10.x bug fixes
to branch-2.10.

I'll send out a vote thread for 2.10.0-RC0 shortly.

Jonathan Hung


On Fri, Oct 11, 2019 at 10:32 AM Jonathan Hung  wrote:

> Edit: seems a 2.10.0 blocker was reopened (HDFS-14305). I'll continue
> watching this jira and start the release once this is resolved.
>
> Jonathan Hung
>
>
> On Thu, Oct 10, 2019 at 5:13 PM Jonathan Hung 
> wrote:
>
>> Hi folks, as of now all 2.10.0 blockers have been resolved [1]. So I'll
>> start the release process soon (cutting branches, updating target versions,
>> etc).
>>
>> [1] https://issues.apache.org/jira/issues/?filter=12346975
>>
>> Jonathan Hung
>>
>>
>> On Mon, Aug 26, 2019 at 10:19 AM Jonathan Hung 
>> wrote:
>>
>>> Hi folks,
>>>
>>> As discussed previously (e.g. [1], [2]) we'd like to do a 2.10.0 release
>>> soon. Some features/big-items we're targeting for this release:
>>>
>>>- YARN resource types/GPU support (YARN-8200
>>><https://issues.apache.org/jira/browse/YARN-8200>)
>>>- Selective wire encryption (HDFS-13541
>>><https://issues.apache.org/jira/browse/HDFS-13541>)
>>>- Rolling upgrade support from 2.x to 3.x (e.g. HDFS-14509
>>><https://issues.apache.org/jira/browse/HDFS-14509>)
>>>
>>> Per [3] sounds like there's concern around upgrading dependencies as
>>> well.
>>>
>>> We created a public jira filter here (
>>> https://issues.apache.org/jira/issues/?filter=12346975) marking all
>>> blockers for 2.10.0 release. If you have other jiras that should be 2.10.0
>>> blockers, please mark "Target Version/s" as "2.10.0" and add label
>>> "release-blocker" so we can track it through this filter.
>>>
>>> We're targeting a release at end of September.
>>>
>>> Please share any thoughts you have about this. Thanks!
>>>
>>> [1]
>>> https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg29461.html
>>> [2]
>>> https://www.mail-archive.com/mapreduce-dev@hadoop.apache.org/msg21293.html
>>> [3]
>>> https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg33440.html
>>>
>>>
>>> Jonathan Hung
>>>
>>


Re: [DISCUSS] Hadoop 2.10.0 release plan

2019-10-11 Thread Jonathan Hung
Edit: seems a 2.10.0 blocker was reopened (HDFS-14305). I'll continue
watching this jira and start the release once this is resolved.

Jonathan Hung


On Thu, Oct 10, 2019 at 5:13 PM Jonathan Hung  wrote:

> Hi folks, as of now all 2.10.0 blockers have been resolved [1]. So I'll
> start the release process soon (cutting branches, updating target versions,
> etc).
>
> [1] https://issues.apache.org/jira/issues/?filter=12346975
>
> Jonathan Hung
>
>
> On Mon, Aug 26, 2019 at 10:19 AM Jonathan Hung 
> wrote:
>
>> Hi folks,
>>
>> As discussed previously (e.g. [1], [2]) we'd like to do a 2.10.0 release
>> soon. Some features/big-items we're targeting for this release:
>>
>>- YARN resource types/GPU support (YARN-8200
>><https://issues.apache.org/jira/browse/YARN-8200>)
>>- Selective wire encryption (HDFS-13541
>><https://issues.apache.org/jira/browse/HDFS-13541>)
>>- Rolling upgrade support from 2.x to 3.x (e.g. HDFS-14509
>><https://issues.apache.org/jira/browse/HDFS-14509>)
>>
>> Per [3] sounds like there's concern around upgrading dependencies as well.
>>
>> We created a public jira filter here (
>> https://issues.apache.org/jira/issues/?filter=12346975) marking all
>> blockers for 2.10.0 release. If you have other jiras that should be 2.10.0
>> blockers, please mark "Target Version/s" as "2.10.0" and add label
>> "release-blocker" so we can track it through this filter.
>>
>> We're targeting a release at end of September.
>>
>> Please share any thoughts you have about this. Thanks!
>>
>> [1] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg29461.html
>> [2]
>> https://www.mail-archive.com/mapreduce-dev@hadoop.apache.org/msg21293.html
>> [3] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg33440.html
>>
>>
>> Jonathan Hung
>>
>


Re: [DISCUSS] Hadoop 2.10.0 release plan

2019-10-10 Thread Jonathan Hung
Hi folks, as of now all 2.10.0 blockers have been resolved [1]. So I'll
start the release process soon (cutting branches, updating target versions,
etc).

[1] https://issues.apache.org/jira/issues/?filter=12346975

Jonathan Hung


On Mon, Aug 26, 2019 at 10:19 AM Jonathan Hung  wrote:

> Hi folks,
>
> As discussed previously (e.g. [1], [2]) we'd like to do a 2.10.0 release
> soon. Some features/big-items we're targeting for this release:
>
>- YARN resource types/GPU support (YARN-8200
><https://issues.apache.org/jira/browse/YARN-8200>)
>- Selective wire encryption (HDFS-13541
><https://issues.apache.org/jira/browse/HDFS-13541>)
>- Rolling upgrade support from 2.x to 3.x (e.g. HDFS-14509
><https://issues.apache.org/jira/browse/HDFS-14509>)
>
> Per [3] sounds like there's concern around upgrading dependencies as well.
>
> We created a public jira filter here (
> https://issues.apache.org/jira/issues/?filter=12346975) marking all
> blockers for 2.10.0 release. If you have other jiras that should be 2.10.0
> blockers, please mark "Target Version/s" as "2.10.0" and add label
> "release-blocker" so we can track it through this filter.
>
> We're targeting a release at end of September.
>
> Please share any thoughts you have about this. Thanks!
>
> [1] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg29461.html
> [2]
> https://www.mail-archive.com/mapreduce-dev@hadoop.apache.org/msg21293.html
> [3] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg33440.html
>
>
> Jonathan Hung
>


[jira] [Created] (YARN-9869) Create scheduling policy to auto-adjust queue elasticity based on cluster demand

2019-09-30 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-9869:
---

 Summary: Create scheduling policy to auto-adjust queue elasticity 
based on cluster demand
 Key: YARN-9869
 URL: https://issues.apache.org/jira/browse/YARN-9869
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Jonathan Hung


Currently LinkedIn has a policy to auto-adjust queue elasticity based on 
real-time queue demand. We've been running this policy in production for a long 
time and it has helped improve overall cluster utilization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions

2019-09-25 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-9858:
---

 Summary: Optimize RMContext getExclusiveEnforcedPartitions 
 Key: YARN-9858
 URL: https://issues.apache.org/jira/browse/YARN-9858
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Hung


Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a hot 
code path, need to optimize it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: Incompatible changes between branch-2.8 and branch-2.9

2019-09-24 Thread Jonathan Hung
- I've created YARN-9855 and uploaded patches to fix YARN-6616 in
branch-2.8 and branch-2.7.
- For YARN-6050, not sure either. Robert/Wangda, can you comment on
YARN-6050 compatibility?
- For YARN-7813, not sure why moving from 2.8.4/5 -> 2.8.6 would be
incompatible with this strategy? It should be OK to remove/add optional
fields (removing the field with id 12, and adding the field with id 13).
The difficulties I see here are, we would have to leave id 12 blank in
2.8.6 (so we cannot have YARN-6164 in branch-2.8), and users on 2.8.4/5
would have to move to 2.8.6 before moving to 2.9+. But rolling upgrade
would still work IIUC.

Jonathan Hung


On Tue, Sep 24, 2019 at 2:52 PM Eric Badger 
wrote:

> *   For YARN-6616, for branch-2.8 and below, it was only committed to
> 2.7.8/2.8.6 which have not been released (as I understand). Perhaps we can
> revert YARN-6616 from branch-2.7 and branch-2.8.
>   - This seems reasonable. Since we haven't released anything, it should
> be no issue to change the 2.7/2.8 protobuf field to have the same value as
> 2.9+
>
> *   For YARN-6050, there's a bit here:
> https://developers.google.com/protocol-buffers/docs/proto that says
> "optional is compatible with repeated", so I think we should be OK there.
>   - Optional is compatible with repeatable over the wire such that
> protobuf won't blow up, but does that actually mean that it's compatible in
> this case? If it's expecting an optional and gets a repeated, it's going to
> drop everything except for the last value. I don't know enough about
> YARN-6050 to say if this will be ok or not.
>
> *   For YARN-7813, it's in 2.8.4 so it seems upgrading from 2.8.4 or 2.8.5
> to a 2.9+ version will be an issue. One option could be to move the
> intraQueuePreemptionDisabled field from id 12 to id 13 in branch-2.8, then
> users would upgrade from 2.8.4/2.8.5 to 2.8.6 (someone would have to
> release this), then upgrade from 2.8.6 to 2.9+.
>   - I'm ok with this, but it should be noted that the upgrade from
> 2.8.4/2.8.5 to 2.8.6 (or 2.9+) would not be compatible for a rolling
> upgrade. So this would cause some pain to anybody with clusters on those
> versions.
>
> Eric
>
> On Tue, Sep 24, 2019 at 2:42 PM Jonathan Hung 
> wrote:
>
>> Sorry, let me edit my first point. We can just create addendums for
>> YARN-6616 in branch-2.7 and branch-2.8 to edit the submitTime field to the
>> correct id 28. We don’t need to revert YARN-6616 from these branches
>> completely.
>>
>> Jonathan
>>
>> 
>> From: Jonathan Hung 
>> Sent: Tuesday, September 24, 2019 11:38 AM
>> To: Eric Badger
>> Cc: Hadoop Common; yarn-dev; mapreduce-dev; Hdfs-dev
>> Subject: Re: Incompatible changes between branch-2.8 and branch-2.9
>>
>> Hi Eric, thanks for the investigation.
>>
>>   *   For YARN-6616, for branch-2.8 and below, it was only committed to
>> 2.7.8/2.8.6 which have not been released (as I understand). Perhaps we can
>> revert YARN-6616 from branch-2.7 and branch-2.8.
>>   *   For YARN-6050, there's a bit here:
>> https://developers.google.com/protocol-buffers/docs/proto that says
>> "optional is compatible with repeated", so I think we should be OK there.
>>   *   For YARN-7813, it's in 2.8.4 so it seems upgrading from 2.8.4 or
>> 2.8.5 to a 2.9+ version will be an issue. One option could be to move the
>> intraQueuePreemptionDisabled field from id 12 to id 13 in branch-2.8, then
>> users would upgrade from 2.8.4/2.8.5 to 2.8.6 (someone would have to
>> release this), then upgrade from 2.8.6 to 2.9+.
>>
>> Jonathan Hung
>>
>>
>> On Tue, Sep 24, 2019 at 9:23 AM Eric Badger 
>> 
>> wrote:
>> We (Verizon Media) are currently moving towards upgrading our clusters
>> from
>> our internal fork of branch-2.8 to an internal fork of branch-2. During
>> this process, we have found multiple incompatible changes in protobufs
>> between branch-2.8 and branch-2. These incompatibilities were all
>> introduced between branch-2.8 and branch-2.9. I did a git diff over all
>> .proto files across the branch-2.8 and branch-2.9 and found 3 instances of
>> incompatibilities from 3 separate commits. All of the incompatibilities
>> are
>> in yarn_protos.proto
>>
>>
>> I would like to discuss how to fix these incompatible changes. Otherwise,
>> rolling upgrades will not be supported between branch-2.8 (or below) and
>> branch-2.9 (or beyond). We could revert the incompatible changes, but then
>> the new releases would be incompatible with the releases that have these
&

[jira] [Created] (YARN-9855) Fix ApplicationReportProto submitTime id in branch-2.8/branch-2.7

2019-09-24 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-9855:
---

 Summary: Fix ApplicationReportProto submitTime id in 
branch-2.8/branch-2.7
 Key: YARN-9855
 URL: https://issues.apache.org/jira/browse/YARN-9855
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Hung
Assignee: Jonathan Hung


As per 
[http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201909.mbox/%3cCAAaVJWUKTBXEYV_-yWs2PT8aqhjQXq=garav+yzjxq0nx36...@mail.gmail.com%3e].
 Update this field to use the same id as in branch-2.9 and above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: Incompatible changes between branch-2.8 and branch-2.9

2019-09-24 Thread Jonathan Hung
Sorry, let me edit my first point. We can just create addendums for YARN-6616 
in branch-2.7 and branch-2.8 to edit the submitTime field to the correct id 28. 
We don’t need to revert YARN-6616 from these branches completely.

Jonathan


From: Jonathan Hung 
Sent: Tuesday, September 24, 2019 11:38 AM
To: Eric Badger
Cc: Hadoop Common; yarn-dev; mapreduce-dev; Hdfs-dev
Subject: Re: Incompatible changes between branch-2.8 and branch-2.9

Hi Eric, thanks for the investigation.

  *   For YARN-6616, for branch-2.8 and below, it was only committed to 
2.7.8/2.8.6 which have not been released (as I understand). Perhaps we can 
revert YARN-6616 from branch-2.7 and branch-2.8.
  *   For YARN-6050, there's a bit here: 
https://developers.google.com/protocol-buffers/docs/proto that says "optional 
is compatible with repeated", so I think we should be OK there.
  *   For YARN-7813, it's in 2.8.4 so it seems upgrading from 2.8.4 or 2.8.5 to 
a 2.9+ version will be an issue. One option could be to move the 
intraQueuePreemptionDisabled field from id 12 to id 13 in branch-2.8, then 
users would upgrade from 2.8.4/2.8.5 to 2.8.6 (someone would have to release 
this), then upgrade from 2.8.6 to 2.9+.

Jonathan Hung


On Tue, Sep 24, 2019 at 9:23 AM Eric Badger  
wrote:
We (Verizon Media) are currently moving towards upgrading our clusters from
our internal fork of branch-2.8 to an internal fork of branch-2. During
this process, we have found multiple incompatible changes in protobufs
between branch-2.8 and branch-2. These incompatibilities were all
introduced between branch-2.8 and branch-2.9. I did a git diff over all
.proto files across the branch-2.8 and branch-2.9 and found 3 instances of
incompatibilities from 3 separate commits. All of the incompatibilities are
in yarn_protos.proto


I would like to discuss how to fix these incompatible changes. Otherwise,
rolling upgrades will not be supported between branch-2.8 (or below) and
branch-2.9 (or beyond). We could revert the incompatible changes, but then
the new releases would be incompatible with the releases that have these
incompatible changes. If we do nothing, then rolling upgrades won't work
between 2.8- and 2.9+.


Thanks,


Eric


---


git diff branch-2.8..branch-2.9 $(find . -name '*\.proto')


https://issues.apache.org/jira/browse/YARN-6616

   - Trunk patch (applied through branch-2.9) differs from branch-2.8 patch

@@ -211,7 +245,20 @@ message ApplicationReportProto {

   optional PriorityProto priority = 23;

   optional string appNodeLabelExpression = 24;

   optional string amNodeLabelExpression = 25;

-  optional int64 submitTime = 26;

+  repeated AppTimeoutsMapProto appTimeouts = 26;

+  optional int64 launchTime = 27;

+  optional int64 submitTime = 28;


https://issues.apache.org/jira/browse/YARN-6050

   - Trunk and branch-2 patches both change the protobuf type in the same
   way.

@@ -356,7 +416,22 @@ message ApplicationSubmissionContextProto {

   optional LogAggregationContextProto log_aggregation_context = 14;

   optional ReservationIdProto reservation_id = 15;

   optional string node_label_expression = 16;

-  optional ResourceRequestProto am_container_resource_request = 17;

+  repeated ResourceRequestProto am_container_resource_request = 17;

+  repeated ApplicationTimeoutMapProto application_timeouts = 18;


https://issues.apache.org/jira/browse/YARN-7813

   - Trunk (applied through branch-3.1) and branch-3.0 (applied through
   branch-2.9) patches differ from branch-2.8 patch

@@ -425,7 +501,21 @@ message QueueInfoProto {

   optional string defaultNodeLabelExpression = 9;

   optional QueueStatisticsProto queueStatistics = 10;

   optional bool preemptionDisabled = 11;

-  optional bool intraQueuePreemptionDisabled = 12;

+  repeated QueueConfigurationsMapProto queueConfigurationsMap = 12;

+  optional bool intraQueuePreemptionDisabled = 13;


Re: Incompatible changes between branch-2.8 and branch-2.9

2019-09-24 Thread Jonathan Hung
Hi Eric, thanks for the investigation.

   - For YARN-6616, for branch-2.8 and below, it was only committed to
   2.7.8/2.8.6 which have not been released (as I understand). Perhaps we can
   revert YARN-6616 from branch-2.7 and branch-2.8.
   - For YARN-6050, there's a bit here:
   https://developers.google.com/protocol-buffers/docs/proto that says
   "optional is compatible with repeated", so I think we should be OK there.
   - For YARN-7813, it's in 2.8.4 so it seems upgrading from 2.8.4 or 2.8.5
   to a 2.9+ version will be an issue. One option could be to move the
   intraQueuePreemptionDisabled field from id 12 to id 13 in branch-2.8, then
   users would upgrade from 2.8.4/2.8.5 to 2.8.6 (someone would have to
   release this), then upgrade from 2.8.6 to 2.9+.


Jonathan Hung


On Tue, Sep 24, 2019 at 9:23 AM Eric Badger
 wrote:

> We (Verizon Media) are currently moving towards upgrading our clusters from
> our internal fork of branch-2.8 to an internal fork of branch-2. During
> this process, we have found multiple incompatible changes in protobufs
> between branch-2.8 and branch-2. These incompatibilities were all
> introduced between branch-2.8 and branch-2.9. I did a git diff over all
> .proto files across the branch-2.8 and branch-2.9 and found 3 instances of
> incompatibilities from 3 separate commits. All of the incompatibilities are
> in yarn_protos.proto
>
>
> I would like to discuss how to fix these incompatible changes. Otherwise,
> rolling upgrades will not be supported between branch-2.8 (or below) and
> branch-2.9 (or beyond). We could revert the incompatible changes, but then
> the new releases would be incompatible with the releases that have these
> incompatible changes. If we do nothing, then rolling upgrades won't work
> between 2.8- and 2.9+.
>
>
> Thanks,
>
>
> Eric
>
>
> ---
>
>
> git diff branch-2.8..branch-2.9 $(find . -name '*\.proto')
>
>
> https://issues.apache.org/jira/browse/YARN-6616
>
>- Trunk patch (applied through branch-2.9) differs from branch-2.8 patch
>
> @@ -211,7 +245,20 @@ message ApplicationReportProto {
>
>optional PriorityProto priority = 23;
>
>optional string appNodeLabelExpression = 24;
>
>optional string amNodeLabelExpression = 25;
>
> -  optional int64 submitTime = 26;
>
> +  repeated AppTimeoutsMapProto appTimeouts = 26;
>
> +  optional int64 launchTime = 27;
>
> +  optional int64 submitTime = 28;
>
>
> https://issues.apache.org/jira/browse/YARN-6050
>
>- Trunk and branch-2 patches both change the protobuf type in the same
>way.
>
> @@ -356,7 +416,22 @@ message ApplicationSubmissionContextProto {
>
>optional LogAggregationContextProto log_aggregation_context = 14;
>
>optional ReservationIdProto reservation_id = 15;
>
>optional string node_label_expression = 16;
>
> -  optional ResourceRequestProto am_container_resource_request = 17;
>
> +  repeated ResourceRequestProto am_container_resource_request = 17;
>
> +  repeated ApplicationTimeoutMapProto application_timeouts = 18;
>
>
> https://issues.apache.org/jira/browse/YARN-7813
>
>- Trunk (applied through branch-3.1) and branch-3.0 (applied through
>branch-2.9) patches differ from branch-2.8 patch
>
> @@ -425,7 +501,21 @@ message QueueInfoProto {
>
>optional string defaultNodeLabelExpression = 9;
>
>optional QueueStatisticsProto queueStatistics = 10;
>
>optional bool preemptionDisabled = 11;
>
> -  optional bool intraQueuePreemptionDisabled = 12;
>
> +  repeated QueueConfigurationsMapProto queueConfigurationsMap = 12;
>
> +  optional bool intraQueuePreemptionDisabled = 13;
>


[jira] [Resolved] (YARN-6684) TestAMRMClient tests fail on branch-2.7

2019-09-19 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung resolved YARN-6684.
-
Resolution: Won't Fix

branch-2.7 EOL, closing as won't fix

> TestAMRMClient tests fail on branch-2.7
> ---
>
> Key: YARN-6684
> URL: https://issues.apache.org/jira/browse/YARN-6684
> Project: Hadoop YARN
>  Issue Type: Bug
>    Reporter: Jonathan Hung
>Priority: Major
>
> {noformat}2017-06-01 19:10:44,362 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:addNode(1335)) - Added node 
> jhung-ld2.linkedin.biz:58205 clusterResource: 
> 2017-06-01 19:10:44,370 INFO  server.MiniYARNCluster 
> (MiniYARNCluster.java:waitForNodeManagersToConnect(657)) - All Node Managers 
> connected in MiniYARNCluster
> 2017-06-01 19:10:44,376 INFO  client.RMProxy (RMProxy.java:createRMProxy(98)) 
> - Connecting to ResourceManager at jhung-ld2.linkedin.biz/ipaddr:36167
> 2017-06-01 19:10:45,501 INFO  ipc.Client 
> (Client.java:handleConnectionFailure(872)) - Retrying connect to server: 
> jhung-ld2.linkedin.biz/ipaddr:36167. Already tried 0 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2017-06-01 19:10:46,502 INFO  ipc.Client 
> (Client.java:handleConnectionFailure(872)) - Retrying connect to server: 
> jhung-ld2.linkedin.biz/ipaddr:36167. Already tried 1 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2017-06-01 19:10:47,503 INFO  ipc.Client 
> (Client.java:handleConnectionFailure(872)) - Retrying connect to server: 
> jhung-ld2.linkedin.biz/ipaddr:36167. Already tried 2 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2017-06-01 19:10:48,504 INFO  ipc.Client 
> (Client.java:handleConnectionFailure(872)) - Retrying connect to server: 
> jhung-ld2.linkedin.biz/ipaddr:36167. Already tried 3 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS){noformat}
> After some investigation, seems it is the same issue as described here: 
> HDFS-11893



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8825) Print application tags in ApplicationSummary

2019-09-19 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung resolved YARN-8825.
-
Resolution: Duplicate

> Print application tags in ApplicationSummary
> 
>
> Key: YARN-8825
> URL: https://issues.apache.org/jira/browse/YARN-8825
> Project: Hadoop YARN
>  Issue Type: Improvement
>    Reporter: Jonathan Hung
>    Assignee: Jonathan Hung
>Priority: Major
>
> Useful for tracking application tag metadata.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9844) TestCapacitySchedulerPerf test errors in branch-2

2019-09-19 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung resolved YARN-9844.
-
Resolution: Fixed

> TestCapacitySchedulerPerf test errors in branch-2
> -
>
> Key: YARN-9844
> URL: https://issues.apache.org/jira/browse/YARN-9844
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test, yarn
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>    Assignee: Jonathan Hung
>Priority: Major
>
> These TestCapacitySchedulerPerf throughput tests are failing in branch-2:
> {{[ERROR]   
> TestCapacitySchedulerPerf.testUserLimitThroughputForFiveResources:263->testUserLimitThroughputWithNumberOfResourceTypes:114
>  » ArrayIndexOutOfBounds}}
> {{[ERROR]   
> TestCapacitySchedulerPerf.testUserLimitThroughputForFourResources:258->testUserLimitThroughputWithNumberOfResourceTypes:114
>  » ArrayIndexOutOfBounds}}
> {{[ERROR]   
> TestCapacitySchedulerPerf.testUserLimitThroughputForThreeResources:253->testUserLimitThroughputWithNumberOfResourceTypes:114
>  » ArrayIndexOutOfBounds}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9825) Changes for initializing placement rules with ResourceScheduler in branch-2

2019-09-09 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-9825:
---

 Summary: Changes for initializing placement rules with 
ResourceScheduler in branch-2
 Key: YARN-9825
 URL: https://issues.apache.org/jira/browse/YARN-9825
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung
Assignee: Jonathan Hung


YARN-8016 and YARN-8948 add functionality to initialize placement rules with 
ResourceScheduler. We need this in branch-2.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9824) Fall back to configured queue ordering policy class name

2019-09-09 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-9824:
---

 Summary: Fall back to configured queue ordering policy class name
 Key: YARN-9824
 URL: https://issues.apache.org/jira/browse/YARN-9824
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung


Currently this is how configured queue ordering policy is determined:
{noformat}
if (policyType.trim().equals(QUEUE_UTILIZATION_ORDERING_POLICY)) {
  // Doesn't respect priority
  qop = new PriorityUtilizationQueueOrderingPolicy(false);
} else if (policyType.trim().equals(
QUEUE_PRIORITY_UTILIZATION_ORDERING_POLICY)) {
  qop = new PriorityUtilizationQueueOrderingPolicy(true);
} else {
  String message =
  "Unable to construct queue ordering policy=" + policyType + " queue="
  + queue;
  throw new YarnRuntimeException(message);
} {noformat}
If we want to enable a policy which is not QUEUE_UTILIZATION_ORDERING_POLICY or 
QUEUE_PRIORITY_UTILIZATION_ORDERING_POLICY, it requires code change here to add 
a keyword for this policy.

It'd be easier if the admin could configure a class name here instead of 
requiring a keyword.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9810) Add queue capacity/maxcapacity percentage metrics

2019-09-03 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-9810:
---

 Summary: Add queue capacity/maxcapacity percentage metrics
 Key: YARN-9810
 URL: https://issues.apache.org/jira/browse/YARN-9810
 Project: Hadoop YARN
  Issue Type: Improvement
 Environment: Similar to YARN-9085, it'd be good to have queue 
(absolute) capacity / (absolute) max capacity metrics in CSQueueMetrics.
Reporter: Jonathan Hung






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9806) TestNMSimulator#testNMSimulator fails in branch-2

2019-08-30 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-9806:
---

 Summary: TestNMSimulator#testNMSimulator fails in branch-2
 Key: YARN-9806
 URL: https://issues.apache.org/jira/browse/YARN-9806
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Hung


{noformat}java.lang.AssertionError: expected:<10240> but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.sls.nodemanager.TestNMSimulator.testNMSimulator(TestNMSimulator.java:92)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runners.Suite.runChild(Suite.java:127)
at org.junit.runners.Suite.runChild(Suite.java:26)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413){noformat}

This appears fixed in YARN-7929. We only need the bit in TestNMSimulator 
though. This jira is to track getting this bit in branch-2.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7585) NodeManager should go unhealthy when state store throws DBException

2019-08-30 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung resolved YARN-7585.
-
Fix Version/s: 2.10.0
   Resolution: Fixed

Committed to branch-2.

> NodeManager should go unhealthy when state store throws DBException 
> 
>
> Key: YARN-7585
> URL: https://issues.apache.org/jira/browse/YARN-7585
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: release-blocker
> Fix For: 2.10.0, 3.1.0
>
> Attachments: YARN-7585.001.patch, YARN-7585.002.patch, 
> YARN-7585.003.patch
>
>
> If work preserving recover is enabled the NM will not start up if the state 
> store does not initialise. However if the state store becomes unavailable 
> after that for any reason the NM will not go unhealthy. 
> Since the state store is not available new containers can not be started any 
> more and the NM should become unhealthy:
> {code}
> AMLauncher: Error launching appattempt_1508806289867_268617_01. Got 
> exception: org.apache.hadoop.yarn.exceptions.YarnException: 
> java.io.IOException: org.iq80.leveldb.DBException: IO error: 
> /dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log: 
> Read-only file system
> at o.a.h.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
> at 
> o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:721)
> ...
> Caused by: java.io.IOException: org.iq80.leveldb.DBException: IO error: 
> /dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log: 
> Read-only file system
> at 
> o.a.h.y.s.n.r.NMLeveldbStateStoreService.storeApplication(NMLeveldbStateStoreService.java:374)
> at 
> o.a.h.y.s.n.cm.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:848)
> at 
> o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:712)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Merge YARN-8200 to branch-2 and branch-3.0

2019-08-29 Thread Jonathan Hung
Thanks all, +1 from me too.

There's three binding +1, two non-binding +1, and no -1 so I'll merge
YARN-8200 to branch-2 shortly. I'll skip branch-3.0 since it's EOL as
others have mentioned.

Jonathan Hung


On Tue, Aug 27, 2019 at 11:49 AM Konstantin Shvachko 
wrote:

> +1 for the merge.
>
> We probably should not bother with branch-3.0 merge since it's been voted
> EOL.
>
> Thanks,
> --Konstantin
>
> On Thu, Aug 22, 2019 at 4:43 PM Jonathan Hung 
> wrote:
>
>> Hi folks,
>>
>> As per [1], starting a vote to merge YARN-8200 (and YARN-8200.branch3)
>> feature branch to branch-2 (and branch-3.0).
>>
>> Vote runs for 7 days, to Thursday, Aug 29 5:00PM PDT.
>>
>> Thanks.
>>
>> [1]
>>
>> http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201908.mbox/%3cCAHzWLgcX7f5Tr3q=csrqgysvpdf7mh-iu17femgx89dhr+1...@mail.gmail.com%3e
>>
>> Jonathan Hung
>>
>


[DISCUSS] Hadoop 2.10.0 release plan

2019-08-26 Thread Jonathan Hung
Hi folks,

As discussed previously (e.g. [1], [2]) we'd like to do a 2.10.0 release
soon. Some features/big-items we're targeting for this release:

   - YARN resource types/GPU support (YARN-8200
   <https://issues.apache.org/jira/browse/YARN-8200>)
   - Selective wire encryption (HDFS-13541
   <https://issues.apache.org/jira/browse/HDFS-13541>)
   - Rolling upgrade support from 2.x to 3.x (e.g. HDFS-14509
   <https://issues.apache.org/jira/browse/HDFS-14509>)

Per [3] sounds like there's concern around upgrading dependencies as well.

We created a public jira filter here (
https://issues.apache.org/jira/issues/?filter=12346975) marking all
blockers for 2.10.0 release. If you have other jiras that should be 2.10.0
blockers, please mark "Target Version/s" as "2.10.0" and add label
"release-blocker" so we can track it through this filter.

We're targeting a release at end of September.

Please share any thoughts you have about this. Thanks!

[1] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg29461.html
[2]
https://www.mail-archive.com/mapreduce-dev@hadoop.apache.org/msg21293.html
[3] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg33440.html


Jonathan Hung


[VOTE] Merge YARN-8200 to branch-2 and branch-3.0

2019-08-22 Thread Jonathan Hung
Hi folks,

As per [1], starting a vote to merge YARN-8200 (and YARN-8200.branch3)
feature branch to branch-2 (and branch-3.0).

Vote runs for 7 days, to Thursday, Aug 29 5:00PM PDT.

Thanks.

[1]
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201908.mbox/%3cCAHzWLgcX7f5Tr3q=csrqgysvpdf7mh-iu17femgx89dhr+1...@mail.gmail.com%3e

Jonathan Hung


[jira] [Created] (YARN-9770) Create a queue ordering policy which picks child queues with equal probability

2019-08-21 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-9770:
---

 Summary: Create a queue ordering policy which picks child queues 
with equal probability
 Key: YARN-9770
 URL: https://issues.apache.org/jira/browse/YARN-9770
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung
Assignee: Jonathan Hung


Ran some simulations with the default queue_utilization_ordering_policy:

An underutilized queue which receives an application with many (thousands) 
resource requests will hog scheduler allocations for a long time (on the order 
of a minute). In the meantime apps are getting submitted to all other queues, 
which increases activeUsers in these queues, which drops user limit in these 
queues to small values if minimum-user-limit-percent is configured to small 
values (e.g. 10%).

To avoid this issue, we assign to queues with equal probability, to avoid 
scenarios where queues don't get allocations for a long time.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [DISCUSS] Merging YARN-8200 to branch-3.0 and branch-2

2019-08-21 Thread Jonathan Hung
Reviving this thread: we tested YARN RU starting with a cluster running
2.7.4, to running branch-2 + YARN-8200. Ran some simple MR/Spark jobs
concurrently with the RM/NM upgrades and did not see any issues.

If no other concerns I'll continue with a vote.

Jonathan Hung


On Thu, Apr 18, 2019 at 5:12 PM Jonathan Hung  wrote:

> Sorry for the delay, had to deprioritize this. Hoping to get to this next
> week.
>
> Jonathan
>
> --
> *From:* Jim Brennan 
> *Sent:* Thursday, April 18, 2019 7:28 AM
> *To:* Jonathan Hung
> *Cc:* yarn-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org
> *Subject:* Re: [DISCUSS] Merging YARN-8200 to branch-3.0 and branch-2
>
> Hi Jonathan,
>
> Hi Jim, we have not tested rolling upgrade. I don’t foresee this being an
>> issue, but we’ll try it out and report back.
>
>
> Any update on this?
> Jim
>
>
> On Wed, Apr 3, 2019 at 2:16 AM Jonathan Hung  wrote:
>
>> Hi Jim, we have not tested rolling upgrade. I don’t foresee this being an
>> issue, but we’ll try it out and report back.
>>
>> Jonathan
>>
>> ------
>> *From:* Jim Brennan 
>> *Sent:* Tuesday, April 2, 2019 9:17 AM
>> *To:* Jonathan Hung
>> *Cc:* yarn-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org
>> *Subject:* Re: [DISCUSS] Merging YARN-8200 to branch-3.0 and branch-2
>>
>> Thanks for working on this!
>> One concern for us is support for a rolling upgrade.  If we are running a
>> cluster based on branch-2.8, will we be able to do a rolling upgrade (no
>> cluster down-time) to a branch containing these changes?  Have you tested
>> rolling upgrades?
>>
>> Thanks.
>> Jim
>>
>> On Fri, Mar 29, 2019 at 2:14 PM Jonathan Hung 
>> wrote:
>>
>>> Hello devs,
>>>
>>> Starting a discuss thread to merge resource types/native GPU scheduling
>>> support to branch-3.0 and branch-2. The resource types work was done in
>>> trunk~branch-3.0 and GPU support done in trunk~branch-3.1, so the
>>> proposal
>>> is to merge GPU support into branch-3.0 and both resource types/GPU
>>> support
>>> to branch-2.
>>>
>>> Internally we've been running resource types/GPU support off a fork of
>>> branch-2.9.0 in a > 300 node GPU cluster for a few months which has
>>> worked
>>> well. Also for completeness we verified that everything going into
>>> branch-2
>>> also exists in branch-3.0.
>>>
>>> The specific list of patches to merge is in feature branch
>>> YARN-8200.branch3 (for branch-3.0) and feature branch YARN-8200 (for
>>> branch-2). Full patches containing the YARN-8200.branch3 -> branch-3.0
>>> diff
>>> and YARN-8200 -> branch-2 diff have been posted to YARN-8200 jira.
>>>
>>> If there's no issues from the community I'll start a merge vote next
>>> week.
>>> Thanks.
>>>
>>> Jonathan Hung
>>>
>>


Re: [VOTE] Mark 2.6, 2.7, 3.0 release lines EOL

2019-08-20 Thread Jonathan Hung
+1. Thanks!

Jonathan Hung


On Tue, Aug 20, 2019 at 8:03 PM Wangda Tan  wrote:

> Hi all,
>
> This is a vote thread to mark any versions smaller than 2.7 (inclusive),
> and 3.0 EOL. This is based on discussions of [1]
>
> This discussion runs for 7 days and will conclude on Aug 28 Wed.
>
> Please feel free to share your thoughts.
>
> Thanks,
> Wangda
>
> [1]
>
> http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201908.mbox/%3cCAD++eC=ou-tit1faob-dbecqe6ht7ede7t1dyra2p1yinpe...@mail.gmail.com%3e
> ,
>


[jira] [Created] (YARN-9764) Print application submission context label in application summary

2019-08-19 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-9764:
---

 Summary: Print application submission context label in application 
summary
 Key: YARN-9764
 URL: https://issues.apache.org/jira/browse/YARN-9764
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9763) Print application tags in application summary

2019-08-19 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-9763:
---

 Summary: Print application tags in application summary
 Key: YARN-9763
 URL: https://issues.apache.org/jira/browse/YARN-9763
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9762) Add submission context label to audit logs

2019-08-19 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-9762:
---

 Summary: Add submission context label to audit logs
 Key: YARN-9762
 URL: https://issues.apache.org/jira/browse/YARN-9762
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung


Currently we log NODELABEL in container allocation/release audit logs, we 
should also log NODELABEL of application submission context on app submission.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9761) Allow overriding application submissions based on server side configs

2019-08-19 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-9761:
---

 Summary: Allow overriding application submissions based on server 
side configs
 Key: YARN-9761
 URL: https://issues.apache.org/jira/browse/YARN-9761
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Jonathan Hung


Create a preprocessor/interceptor which takes each app submitted to RM and 
overrides the submission context based on server side configs.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9760) Support configuring application priorities on a workflow level

2019-08-19 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-9760:
---

 Summary: Support configuring application priorities on a workflow 
level
 Key: YARN-9760
 URL: https://issues.apache.org/jira/browse/YARN-9760
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Jonathan Hung


Currently priorities are submitted on an application level, but for end users 
it's common to submit workloads to YARN at a workflow level. This jira proposes 
a feature to store workflow id + priority mappings on RM (similar to queue 
mappings). If app is submitted with a certain workflow id (as set in 
application submission context) RM will override this app's priority with the 
one defined in the mapping.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9751) Separate queue and app ordering policy capacity scheduler configs

2019-08-15 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9751:
---

 Summary: Separate queue and app ordering policy capacity scheduler 
configs
 Key: YARN-9751
 URL: https://issues.apache.org/jira/browse/YARN-9751
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Jonathan Hung


Right now it's not possible to specify distinct app and queue ordering policies 
since they share the same {{ordering-policy}} suffix.

There's already a TODO in CapacitySchedulerConfiguration for this. This Jira 
intends to fix it.
{noformat}
// TODO (wangda): We need to better distinguish app ordering policy and queue
// ordering policy's classname / configuration options, etc. And dedup code
// if possible.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [DISCUSS] Hadoop 2019 Release Planning

2019-08-12 Thread Jonathan Hung
Hi Wangda, Thanks for starting the discussion. We would also like to
release 2.10.0 which was discussed previously
<https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg29479.html> and
at various contributor meetups. I'm interested in being release manager for
that.

Thanks,

Jonathan Hung


On Fri, Aug 9, 2019 at 7:59 PM Wangda Tan  wrote:

> Hi all,
>
> Hope this email finds you well
>
> I want to hear your thoughts about what should be the release plan for
> 2019.
>
> In 2018, we released:
> - 1 maintenance release of 2.6
> - 3 maintenance releases of 2.7
> - 3 maintenance releases of 2.8
> - 3 releases of 2.9
> - 4 releases of 3.0
> - 2 releases of 3.1
>
> Total 16 releases in 2018.
>
> In 2019, by far we only have two releases:
> - 1 maintenance release of 3.1
> - 1 minor release of 3.2.
>
> However, the community put a lot of efforts to stabilize features of
> various release branches.
> There're:
> - 217 fixed patches in 3.1.3 [1]
> - 388 fixed patches in 3.2.1 [2]
> - 1172 fixed patches in 3.3.0 [3] (OMG!)
>
> I think it is the time to do maintenance releases of 3.1/3.2 and do a minor
> release for 3.3.0.
>
> In addition, I saw community discussion to do a 2.8.6 release for security
> fixes.
>
> Any other releases? I think there're release plans for Ozone as well. And
> please add your thoughts.
>
> Volunteers welcome! If you have interests to run a release as Release
> Manager (or co-Resource Manager), please respond to this email thread so we
> can coordinate.
>
> Thanks,
> Wangda Tan
>
> [1] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND resolution = Fixed AND
> fixVersion = 3.1.3
> [2] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND resolution = Fixed AND
> fixVersion = 3.2.1
> [3] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND resolution = Fixed AND
> fixVersion = 3.3.0
>


[jira] [Created] (YARN-9736) Recursively configure app ordering policies

2019-08-09 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9736:
---

 Summary: Recursively configure app ordering policies
 Key: YARN-9736
 URL: https://issues.apache.org/jira/browse/YARN-9736
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Jonathan Hung


Currently app ordering policy will find confs with prefix 
{{.ordering-policy}}. For queues with same ordering policy 
configurations it's easier to have a queue inherit confs from its parent.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9730) Support forcing configured partitions to be exclusive based on app node label

2019-08-08 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9730:
---

 Summary: Support forcing configured partitions to be exclusive 
based on app node label
 Key: YARN-9730
 URL: https://issues.apache.org/jira/browse/YARN-9730
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Jonathan Hung
Assignee: Jonathan Hung


Use case: queue X has all of its workload in non-default (exclusive) partition 
P (by setting app submission context's node label set to P). Node in partition 
Q != P heartbeats to RM. Capacity scheduler loops through every application in 
X, and every scheduler key in this application, and fails to allocate each time 
since the app's requested label and the node's label don't match. This causes 
huge performance degradation when number of apps in X is large.

To fix the issue, allow RM to configure partitions as "forced-exclusive". If 
partition P is "forced-exclusive", then:
 * If app sets its submission context's node label to P, all its resource 
requests will be overridden to P
 * If app sets its submission context's node label Q, any of its resource 
requests whose labels are P will be overridden to Q
 * In the scheduler, we add apps with node label expression P to a separate 
data structure. When a node in partition P heartbeats to scheduler, we only try 
to schedule apps in this data structure. When a node in partition Q heartbeats 
to scheduler, we schedule the rest of the apps as normal.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9668) UGI conf doesn't read user overridden configurations on RM and NM startup

2019-07-08 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9668:
---

 Summary: UGI conf doesn't read user overridden configurations on 
RM and NM startup
 Key: YARN-9668
 URL: https://issues.apache.org/jira/browse/YARN-9668
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Hung
Assignee: Jonathan Hung


Similar to HADOOP-15150. Configs overridden thru e.g. -D or -conf are not 
passed to the UGI conf on RM or NM startup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9615) Add dispatcher metrics to RM

2019-06-10 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9615:
---

 Summary: Add dispatcher metrics to RM
 Key: YARN-9615
 URL: https://issues.apache.org/jira/browse/YARN-9615
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Jonathan Hung
Assignee: Jonathan Hung


It'd be good to have counts/processing times for each event type in RM async 
dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic

2019-05-15 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9559:
---

 Summary: Create AbstractContainersLauncher for pluggable 
ContainersLauncher logic
 Key: YARN-9559
 URL: https://issues.apache.org/jira/browse/YARN-9559
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Jonathan Hung
Assignee: Jonathan Hung






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9529) Log correct cpu controller path on error

2019-05-03 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9529:
---

 Summary: Log correct cpu controller path on error
 Key: YARN-9529
 URL: https://issues.apache.org/jira/browse/YARN-9529
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung
Assignee: Jonathan Hung
 Attachments: YARN-9529.001.patch

The base cpu controller path is logged instead of the hadoop cgroup path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [DISCUSS] Merging YARN-8200 to branch-3.0 and branch-2

2019-04-18 Thread Jonathan Hung
Sorry for the delay, had to deprioritize this. Hoping to get to this next week.

Jonathan


From: Jim Brennan 
Sent: Thursday, April 18, 2019 7:28 AM
To: Jonathan Hung
Cc: yarn-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org
Subject: Re: [DISCUSS] Merging YARN-8200 to branch-3.0 and branch-2

Hi Jonathan,

Hi Jim, we have not tested rolling upgrade. I don’t foresee this being an 
issue, but we’ll try it out and report back.

Any update on this?
Jim


On Wed, Apr 3, 2019 at 2:16 AM Jonathan Hung 
mailto:jyhung2...@gmail.com>> wrote:
Hi Jim, we have not tested rolling upgrade. I don’t foresee this being an 
issue, but we’ll try it out and report back.

Jonathan


From: Jim Brennan 
mailto:james.bren...@verizonmedia.com>>
Sent: Tuesday, April 2, 2019 9:17 AM
To: Jonathan Hung
Cc: yarn-dev@hadoop.apache.org<mailto:yarn-dev@hadoop.apache.org>; 
mapreduce-...@hadoop.apache.org<mailto:mapreduce-...@hadoop.apache.org>
Subject: Re: [DISCUSS] Merging YARN-8200 to branch-3.0 and branch-2

Thanks for working on this!
One concern for us is support for a rolling upgrade.  If we are running a 
cluster based on branch-2.8, will we be able to do a rolling upgrade (no 
cluster down-time) to a branch containing these changes?  Have you tested 
rolling upgrades?

Thanks.
Jim

On Fri, Mar 29, 2019 at 2:14 PM Jonathan Hung 
mailto:jyhung2...@gmail.com>> wrote:
Hello devs,

Starting a discuss thread to merge resource types/native GPU scheduling
support to branch-3.0 and branch-2. The resource types work was done in
trunk~branch-3.0 and GPU support done in trunk~branch-3.1, so the proposal
is to merge GPU support into branch-3.0 and both resource types/GPU support
to branch-2.

Internally we've been running resource types/GPU support off a fork of
branch-2.9.0 in a > 300 node GPU cluster for a few months which has worked
well. Also for completeness we verified that everything going into branch-2
also exists in branch-3.0.

The specific list of patches to merge is in feature branch
YARN-8200.branch3 (for branch-3.0) and feature branch YARN-8200 (for
branch-2). Full patches containing the YARN-8200.branch3 -> branch-3.0 diff
and YARN-8200 -> branch-2 diff have been posted to YARN-8200 jira.

If there's no issues from the community I'll start a merge vote next week.
Thanks.

Jonathan Hung


[jira] [Created] (YARN-9438) launchTime not written to state store for running applications

2019-04-03 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9438:
---

 Summary: launchTime not written to state store for running 
applications
 Key: YARN-9438
 URL: https://issues.apache.org/jira/browse/YARN-9438
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Hung
Assignee: Jonathan Hung


launchTime is only saved to state store after application finishes, so if 
restart happens, any running applications will have launchTime set as -1 (since 
this is the default timestamp of the recovery event).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [DISCUSS] Merging YARN-8200 to branch-3.0 and branch-2

2019-04-03 Thread Jonathan Hung
Hi Jim, we have not tested rolling upgrade. I don’t foresee this being an 
issue, but we’ll try it out and report back.

Jonathan


From: Jim Brennan 
Sent: Tuesday, April 2, 2019 9:17 AM
To: Jonathan Hung
Cc: yarn-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org
Subject: Re: [DISCUSS] Merging YARN-8200 to branch-3.0 and branch-2

Thanks for working on this!
One concern for us is support for a rolling upgrade.  If we are running a 
cluster based on branch-2.8, will we be able to do a rolling upgrade (no 
cluster down-time) to a branch containing these changes?  Have you tested 
rolling upgrades?

Thanks.
Jim

On Fri, Mar 29, 2019 at 2:14 PM Jonathan Hung 
mailto:jyhung2...@gmail.com>> wrote:
Hello devs,

Starting a discuss thread to merge resource types/native GPU scheduling
support to branch-3.0 and branch-2. The resource types work was done in
trunk~branch-3.0 and GPU support done in trunk~branch-3.1, so the proposal
is to merge GPU support into branch-3.0 and both resource types/GPU support
to branch-2.

Internally we've been running resource types/GPU support off a fork of
branch-2.9.0 in a > 300 node GPU cluster for a few months which has worked
well. Also for completeness we verified that everything going into branch-2
also exists in branch-3.0.

The specific list of patches to merge is in feature branch
YARN-8200.branch3 (for branch-3.0) and feature branch YARN-8200 (for
branch-2). Full patches containing the YARN-8200.branch3 -> branch-3.0 diff
and YARN-8200 -> branch-2 diff have been posted to YARN-8200 jira.

If there's no issues from the community I'll start a merge vote next week.
Thanks.

Jonathan Hung


[DISCUSS] Merging YARN-8200 to branch-3.0 and branch-2

2019-03-29 Thread Jonathan Hung
Hello devs,

Starting a discuss thread to merge resource types/native GPU scheduling
support to branch-3.0 and branch-2. The resource types work was done in
trunk~branch-3.0 and GPU support done in trunk~branch-3.1, so the proposal
is to merge GPU support into branch-3.0 and both resource types/GPU support
to branch-2.

Internally we've been running resource types/GPU support off a fork of
branch-2.9.0 in a > 300 node GPU cluster for a few months which has worked
well. Also for completeness we verified that everything going into branch-2
also exists in branch-3.0.

The specific list of patches to merge is in feature branch
YARN-8200.branch3 (for branch-3.0) and feature branch YARN-8200 (for
branch-2). Full patches containing the YARN-8200.branch3 -> branch-3.0 diff
and YARN-8200 -> branch-2 diff have been posted to YARN-8200 jira.

If there's no issues from the community I'll start a merge vote next week.
Thanks.

Jonathan Hung


[jira] [Resolved] (YARN-9412) Backport YARN-6909 to branch-2

2019-03-27 Thread Jonathan Hung (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung resolved YARN-9412.
-
Resolution: Fixed

This ended up being a clean port. Closing.

> Backport YARN-6909 to branch-2
> --
>
> Key: YARN-9412
> URL: https://issues.apache.org/jira/browse/YARN-9412
> Project: Hadoop YARN
>  Issue Type: Sub-task
>    Reporter: Jonathan Hung
>    Assignee: Jonathan Hung
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9412) Backport YARN-6909 to branch-2

2019-03-26 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9412:
---

 Summary: Backport YARN-6909 to branch-2
 Key: YARN-9412
 URL: https://issues.apache.org/jira/browse/YARN-9412
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Hung
Assignee: Jonathan Hung






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9409) Port resource type changes from YARN-7237 to branch-3.0/branch-2

2019-03-25 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9409:
---

 Summary: Port resource type changes from YARN-7237 to 
branch-3.0/branch-2
 Key: YARN-9409
 URL: https://issues.apache.org/jira/browse/YARN-9409
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Hung
Assignee: Jonathan Hung






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9397) Fix empty NMResourceInfo object test failures in branch-2

2019-03-18 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9397:
---

 Summary: Fix empty NMResourceInfo object test failures in branch-2
 Key: YARN-9397
 URL: https://issues.apache.org/jira/browse/YARN-9397
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Hung
Assignee: Jonathan Hung
 Attachments: YARN-9397-YARN-8200.001.patch

Appears the empty object handling behavior changed in jersey versions (branch-2 
is on jersey 1.9, branch-3 on 1.19).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9291) Backport YARN-7637 to branch-2

2019-02-08 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9291:
---

 Summary: Backport YARN-7637 to branch-2
 Key: YARN-9291
 URL: https://issues.apache.org/jira/browse/YARN-9291
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Jonathan Hung
Assignee: Jonathan Hung






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Moving branch-2 precommit/nightly test builds to java 8

2019-02-07 Thread Jonathan Hung
My non-binding +1 to finish. This vote passes with 6 binding +1, 3
non-binding +1, and no vetoes. We will make the changes as part
of HADOOP-15711, please follow there.

Thanks all!

Jonathan Hung


On Tue, Feb 5, 2019 at 11:38 PM Akira Ajisaka  wrote:

> +1
>
> -Akira
>
> On Wed, Feb 6, 2019 at 9:13 AM Wangda Tan  wrote:
> >
> > +1, make sense to me.
> >
> > On Tue, Feb 5, 2019 at 3:29 PM Konstantin Shvachko  >
> > wrote:
> >
> > > +1 Makes sense to me.
> > >
> > > Thanks,
> > > --Konst
> > >
> > > On Mon, Feb 4, 2019 at 6:14 PM Jonathan Hung 
> wrote:
> > >
> > > > Hello,
> > > >
> > > > Starting a vote based on the discuss thread [1] for moving branch-2
> > > > precommit/nightly test builds to openjdk8. After this change, the
> test
> > > > phase for precommit builds [2] and branch-2 nightly build [3] will
> run on
> > > > openjdk8. To maintain source compatibility, these builds will still
> run
> > > > their compile phase for branch-2 on openjdk7 as they do now (in
> addition
> > > to
> > > > compiling on openjdk8).
> > > >
> > > > Vote will run for three business days until Thursday Feb 7 6:00PM
> PDT.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> https://lists.apache.org/thread.html/7e6fb28fc67560f83a2eb62752df35a8d58d86b2a3df4cacb5d738ca@%3Ccommon-dev.hadoop.apache.org%3E
> > > >
> > > > [2]
> > > >
> > >
> https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-HADOOP-Build/
> > > >
> https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-HDFS-Build/
> > > >
> https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/
> > > >
> > > >
> > >
> https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-MAPREDUCE-Build/
> > > >
> > > > [3]
> > > >
> > > >
> > >
> https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-branch2-java7-linux-x86/
> > > >
> > > > Jonathan Hung
> > > >
> > >
>


[jira] [Created] (YARN-9289) Backport YARN-7330 for GPU in UI to branch-2

2019-02-07 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9289:
---

 Summary: Backport YARN-7330 for GPU in UI to branch-2
 Key: YARN-9289
 URL: https://issues.apache.org/jira/browse/YARN-9289
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Hung
Assignee: Jonathan Hung






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[VOTE] Moving branch-2 precommit/nightly test builds to java 8

2019-02-04 Thread Jonathan Hung
Hello,

Starting a vote based on the discuss thread [1] for moving branch-2
precommit/nightly test builds to openjdk8. After this change, the test
phase for precommit builds [2] and branch-2 nightly build [3] will run on
openjdk8. To maintain source compatibility, these builds will still run
their compile phase for branch-2 on openjdk7 as they do now (in addition to
compiling on openjdk8).

Vote will run for three business days until Thursday Feb 7 6:00PM PDT.

[1]
https://lists.apache.org/thread.html/7e6fb28fc67560f83a2eb62752df35a8d58d86b2a3df4cacb5d738ca@%3Ccommon-dev.hadoop.apache.org%3E

[2]
https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-HADOOP-Build/
https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-HDFS-Build/
https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/
https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-MAPREDUCE-Build/

[3]
https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-branch2-java7-linux-x86/

Jonathan Hung


[jira] [Created] (YARN-9280) Backport YARN-6620 to YARN-8200/branch-2

2019-02-04 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9280:
---

 Summary: Backport YARN-6620 to YARN-8200/branch-2
 Key: YARN-9280
 URL: https://issues.apache.org/jira/browse/YARN-9280
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Hung






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [DISCUSS] Moving branch-2 to java 8

2019-02-04 Thread Jonathan Hung
Hi Anu, we will configure precommit jobs to continue compiling on openjdk7.
If there's incompatible source changes then the precommit job will catch
this. The change proposed here is only for the *test* phase of branch-2
precommit executions (and branch-2 nightly job) to run on openjdk8 only.

Jonathan Hung


On Mon, Feb 4, 2019 at 10:45 AM Anu Engineer 
wrote:

> Konstantin,
>
> Just a nitpicky thought, if we move this branch to Java-8 on Jenkins, but
> still hope to release code that can run on Java 7, how will we detect
> Java 8 only changes? I am asking because till now whenever I checked in
> Java 8 features in branch-2 Jenkins would catch that issue.
>
> With this approach, we might not find it out the issues till the release
> time when the release manager decides to compile with Java 7.
> It might be more pragmatic to say that your Java 7 mileage may vary once
> this goes in, since we will have no visibility to Java 7 compatibility
> until it is too late.
>
> Another approach could be that we create a read-only 2.x branch, then we
> know that code will work with Java 7 since the last snapshot was known to
> work with Java 7.
>
>
> Thanks
> Anu
>
>
>
> On 2/1/19, 5:04 PM, "Konstantin Shvachko"  wrote:
>
> Just to make sure we are on the same page, as the subject of this
> thread is
> too generic and confusing.
> *The proposal is to move branch-2 Jenkins builds such as precommit to
> run
> tests on openJDK-8.*
> We do not want to break Java 7 source compatibility. The sources and
> releases will still depend on Java 7.
> We don't see test failures discussed in HADOOP-15711 when we run them
> locally with Oracle Java 7.
>
> Thanks,
> --Konst
>
> On Fri, Feb 1, 2019 at 12:44 PM Jonathan Hung 
> wrote:
>
> > Thanks Vinod and Steve, agreed about java7 compile compatibility. At
> least
> > for now, we should be able to maintain java7 source compatibility
> and run
> > tests on java8. There's a test run here:
> >
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86-jhung/46/
> > which calls a java8 specific API, installs both openjdk7/openjdk8 in
> the
> > dockerfile, compiles on both versions, and tests on just java8 (via
> >
> >
> --multijdkdirs=/usr/lib/jvm/java-7-openjdk-amd64,/usr/lib/jvm/java-8-openjdk-amd64
> > and --multijdktests=compile). If we eventually decide it's too much
> of a
> > pain to maintain java7 source compatibility we can do that at a later
> > point.
> >
> > Also based on discussion with others in the community at the
> contributors
> > meetup this past Wednesday, seems we are generally in favor of
> testing
> > against java8. I'll start a vote soon.
> >
> > Jonathan Hung
> >
> >
> > On Tue, Jan 29, 2019 at 4:11 AM Steve Loughran <
> ste...@hortonworks.com>
> > wrote:
> >
> > > branch-2 is the JDK 7 branch, but for a long time I (and presumably
> > > others) have relied on jenkins to keep us honest by doing that
> build and
> > > test
> > >
> > > right now, we can't do that any more, due to jdk7 bugs which will
> never
> > be
> > > fixed by oracle, or at least, not in a public release.
> > >
> > > If we can still do the compile in java 7 language and link to java
> 7 JDK,
> > > then that bit of the release is good -then java 8 can be used for
> that
> > test
> > >
> > > Ultimately, we're going to be forced onto java 8 just because all
> our
> > > dependencies have moved onto it, and some CVE will force us to
> move.
> > >
> > > At which point, I think its time to declare branch-2 dead. It's
> had a
> > > great life, but trying to keep java 7 support alive isn't
> sustainable.
> > Not
> > > just in this testing, but
> > > cherrypicking patches back gets more and more difficult -branch-3
> has
> > > moved on in both use of java 8 language, and in the codebase in
> general.
> > >
> > > > On 28 Jan 2019, at 20:18, Vinod Kumar Vavilapalli <
> vino...@apache.org>
> > > wrote:
> > > >
> > > > The community made a decision long time ago that we'd like to
> keep the
> > > compatibility & so tie branch-2 to Java 7, but do Java 8+ only
> work on
> > 3.x.
> > > >
>  

Re: [DISCUSS] Moving branch-2 to java 8

2019-02-04 Thread Jonathan Hung
Yeah, it's possible with yetus, there's one example here
<https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-branch2-java7-linux-x86-jhung/60/console>
which
runs compilation on openjdk7 (and openjdk8), and runs tests on openjdk8
only.

Jonathan Hung


On Mon, Feb 4, 2019 at 10:11 AM Steve Loughran 
wrote:

>
>
> On 2 Feb 2019, at 00:57, Konstantin Shvachko  wrote:
>
> Just to make sure we are on the same page, as the subject of this thread
> is too generic and confusing.
> *The proposal is to move branch-2 Jenkins builds such as precommit to run
> tests on openJDK-8.*
> We do not want to break Java 7 source compatibility. The sources and
> releases will still depend on Java 7.
> We don't see test failures discussed in HADOOP-15711 when we run them
> locally with Oracle Java 7.
>
> Thanks,
> --Konst
>
>
> Given the tests aren't working today, the risk that an openjdk 8 test run
> hides a problem which would show up on openjdk 7 has to consider that at
> least openjdk8 will run the tests.
>
> One thing I would like to be confident is that at least the compile phase
> of all the source (including generated source) is on jdk7, and its only the
> test run which switches JVM. Can we do that?
>


Re: [VOTE] Propose to start new Hadoop sub project "submarine"

2019-02-01 Thread Jonathan Hung
+1. Thanks Wangda.

Jonathan Hung


On Fri, Feb 1, 2019 at 2:25 PM Dinesh Chitlangia <
dchitlan...@hortonworks.com> wrote:

> +1 (non binding), thanks Wangda for organizing this.
>
> Regards,
> Dinesh
>
>
>
> On 2/1/19, 5:24 PM, "Wangda Tan"  wrote:
>
> Hi all,
>
> According to positive feedbacks from the thread [1]
>
> This is vote thread to start a new subproject named "hadoop-submarine"
> which follows the release process already established for ozone.
>
> The vote runs for usual 7 days, which ends at Feb 8th 5 PM PDT.
>
> Thanks,
> Wangda Tan
>
> [1]
>
> https://lists.apache.org/thread.html/f864461eb188bd12859d51b0098ec38942c4429aae7e4d001a633d96@%3Cyarn-dev.hadoop.apache.org%3E
>
>
>


[jira] [Created] (YARN-9272) Backport YARN-7738 for refreshing max allocation for multiple resource types

2019-02-01 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9272:
---

 Summary: Backport YARN-7738 for refreshing max allocation for 
multiple resource types
 Key: YARN-9272
 URL: https://issues.apache.org/jira/browse/YARN-9272
 Project: Hadoop YARN
  Issue Type: Sub-task
 Environment: Backport to YARN-8200 feature branch (for branch-2).
Reporter: Jonathan Hung
Assignee: Jonathan Hung






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9271) Backport YARN-6927 for resource type support in MapReduce

2019-02-01 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9271:
---

 Summary: Backport YARN-6927 for resource type support in MapReduce
 Key: YARN-9271
 URL: https://issues.apache.org/jira/browse/YARN-9271
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Hung






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [DISCUSS] Moving branch-2 to java 8

2019-02-01 Thread Jonathan Hung
Thanks Vinod and Steve, agreed about java7 compile compatibility. At least
for now, we should be able to maintain java7 source compatibility and run
tests on java8. There's a test run here:
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86-jhung/46/
which calls a java8 specific API, installs both openjdk7/openjdk8 in the
dockerfile, compiles on both versions, and tests on just java8 (via
--multijdkdirs=/usr/lib/jvm/java-7-openjdk-amd64,/usr/lib/jvm/java-8-openjdk-amd64
and --multijdktests=compile). If we eventually decide it's too much of a
pain to maintain java7 source compatibility we can do that at a later point.

Also based on discussion with others in the community at the contributors
meetup this past Wednesday, seems we are generally in favor of testing
against java8. I'll start a vote soon.

Jonathan Hung


On Tue, Jan 29, 2019 at 4:11 AM Steve Loughran 
wrote:

> branch-2 is the JDK 7 branch, but for a long time I (and presumably
> others) have relied on jenkins to keep us honest by doing that build and
> test
>
> right now, we can't do that any more, due to jdk7 bugs which will never be
> fixed by oracle, or at least, not in a public release.
>
> If we can still do the compile in java 7 language and link to java 7 JDK,
> then that bit of the release is good -then java 8 can be used for that test
>
> Ultimately, we're going to be forced onto java 8 just because all our
> dependencies have moved onto it, and some CVE will force us to move.
>
> At which point, I think its time to declare branch-2 dead. It's had a
> great life, but trying to keep java 7 support alive isn't sustainable. Not
> just in this testing, but
> cherrypicking patches back gets more and more difficult -branch-3 has
> moved on in both use of java 8 language, and in the codebase in general.
>
> > On 28 Jan 2019, at 20:18, Vinod Kumar Vavilapalli 
> wrote:
> >
> > The community made a decision long time ago that we'd like to keep the
> compatibility & so tie branch-2 to Java 7, but do Java 8+ only work on 3.x.
> >
> > I always assumed that most (all?) downstream users build branch-2 on JDK
> 7 only, can anyone confirm? If so, there may be an easier way to address
> these test issues.
> >
> > +Vinod
> >
> >> On Jan 28, 2019, at 11:24 AM, Jonathan Hung 
> wrote:
> >>
> >> Hi folks,
> >>
> >> Forking a discussion based on HADOOP-15711. To summarize, there are
> issues
> >> with branch-2 tests running on java 7 (openjdk) which don't exist on
> java
> >> 8. From our testing, the build can pass with openjdk 8.
> >>
> >> For branch-3, the work to move the build to use java 8 was done in
> >> HADOOP-14816 as part of the Dockerfile OS version change. HADOOP-16053
> was
> >> filed to backport this OS version change to branch-2 (but without the
> java
> >> 7 -> java 8 change). So my proposal is to also make the java 7 -> java 8
> >> version change in branch-2.
> >>
> >> As mentioned in HADOOP-15711, the main issue is around source and binary
> >> compatibility. I don't currently have a great answer, but one initial
> >> thought is to build source/binary against java 7 to ensure compatibility
> >> and run the rest of the build as java 8.
> >>
> >> Thoughts?
> >>
> >> Jonathan Hung
> >
> >
> > -
> > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> >
>
>


[jira] [Resolved] (YARN-9261) Backport YARN-7270 addendum to YARN-8200

2019-01-31 Thread Jonathan Hung (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung resolved YARN-9261.
-
Resolution: Fixed

Clean backport. Pushed to YARN-8200

> Backport YARN-7270 addendum to YARN-8200
> 
>
> Key: YARN-9261
> URL: https://issues.apache.org/jira/browse/YARN-9261
> Project: Hadoop YARN
>  Issue Type: Sub-task
>    Reporter: Jonathan Hung
>    Assignee: Jonathan Hung
>Priority: Major
>
> There was an addendum to YARN-7270 added to branch-3.0 for changes after 
> resource-type feature was added. We need it in YARN-8200 feature branch too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9261) Backport YARN-7270 addendum to YARN-8200

2019-01-31 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9261:
---

 Summary: Backport YARN-7270 addendum to YARN-8200
 Key: YARN-9261
 URL: https://issues.apache.org/jira/browse/YARN-9261
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Hung
Assignee: Jonathan Hung


There was an addendum to YARN-7270 added to branch-3.0 for changes after 
resource-type feature was added. We need it in YARN-8200 feature branch too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [DISCUSS] Making submarine to different release model like Ozone

2019-01-31 Thread Jonathan Hung
+1. This is important for improving the deep learning on hadoop story.
There's recently a lot of momentum for this, and decoupling
submarine/hadoop will help it continue.

Jonathan Hung


On Thu, Jan 31, 2019 at 11:04 AM Wangda Tan  wrote:

> Hi devs,
>
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
>
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
>
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
>
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
>
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
>
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
>
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3ccahfhakh6_m3yldf5a2kq8+w-5fbvx5ahfgs-x1vajw8gmnz...@mail.gmail.com%3E
>
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
>
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> - User doc
> <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>
> Thoughts?
>
> Thanks,
> Wangda Tan
>


[DISCUSS] Moving branch-2 to java 8

2019-01-28 Thread Jonathan Hung
Hi folks,

Forking a discussion based on HADOOP-15711. To summarize, there are issues
with branch-2 tests running on java 7 (openjdk) which don't exist on java
8. From our testing, the build can pass with openjdk 8.

For branch-3, the work to move the build to use java 8 was done in
HADOOP-14816 as part of the Dockerfile OS version change. HADOOP-16053 was
filed to backport this OS version change to branch-2 (but without the java
7 -> java 8 change). So my proposal is to also make the java 7 -> java 8
version change in branch-2.

As mentioned in HADOOP-15711, the main issue is around source and binary
compatibility. I don't currently have a great answer, but one initial
thought is to build source/binary against java 7 to ensure compatibility
and run the rest of the build as java 8.

Thoughts?

Jonathan Hung


[jira] [Resolved] (YARN-9222) Change startTime semantics for RMApp

2019-01-23 Thread Jonathan Hung (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung resolved YARN-9222.
-
Resolution: Fixed

darn it, seems this is a dupe of YARN-7088

> Change startTime semantics for RMApp
> 
>
> Key: YARN-9222
> URL: https://issues.apache.org/jira/browse/YARN-9222
> Project: Hadoop YARN
>  Issue Type: Bug
>    Reporter: Jonathan Hung
>Priority: Major
>
> Currently submitTime for rmApp is based on when app is submitted to 
> RMAppManager:
> {noformat}
> rmAppManager.submitApplication(submissionContext,
> System.currentTimeMillis(), user);{noformat}
> Then RMAppManager#createAndPopulateNewRMApp does some validation (queue 
> routing, app priority, etc), then the RMAppImpl object is created, at which 
> point the startTime is populated:
> {noformat}
> if (startTime <= 0) {
>   this.startTime = this.systemClock.getTime();
> } else {
>   this.startTime = startTime;
> }{noformat}
> In general it seems there shouldn't be much difference between submitTime and 
> startTime. It makes more sense to change startTime to when the app actually 
> started. One possible solution is to (re)populate startTime when application 
> master registers with RM.
> One issue may be compatibility, especially if there are large scheduling 
> delays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9222) Change startTime semantics for RMApp

2019-01-22 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9222:
---

 Summary: Change startTime semantics for RMApp
 Key: YARN-9222
 URL: https://issues.apache.org/jira/browse/YARN-9222
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Hung


Currently submitTime for rmApp is based on when app is submitted to 
RMAppManager:
{noformat}
rmAppManager.submitApplication(submissionContext,
System.currentTimeMillis(), user);{noformat}
Then RMAppManager#createAndPopulateNewRMApp does some validation (queue 
routing, app priority, etc), then the RMAppImpl object is created, at which 
point the startTime is populated:
{noformat}
if (startTime <= 0) {
  this.startTime = this.systemClock.getTime();
} else {
  this.startTime = startTime;
}{noformat}
In general it seems there shouldn't be much difference between submitTime and 
startTime. It makes more sense to change startTime to when the app actually 
started. One possible solution is to (re)populate startTime when application 
master registers with RM.

One issue may be compatibility, especially if there are large scheduling delays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9188) Port YARN-7136 to branch-2

2019-01-09 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9188:
---

 Summary: Port YARN-7136 to branch-2
 Key: YARN-9188
 URL: https://issues.apache.org/jira/browse/YARN-9188
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Hung
Assignee: Jonathan Hung






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9187) Backport YARN-6852 for GPU-specific native changes to branch-2

2019-01-08 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9187:
---

 Summary: Backport YARN-6852 for GPU-specific native changes to 
branch-2
 Key: YARN-9187
 URL: https://issues.apache.org/jira/browse/YARN-9187
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Hung
Assignee: Jonathan Hung
 Attachments: YARN-9187-YARN-8200.001.patch

YARN-6852 adds native GPU support, including
 # general native code refactoring
 # GPU specific native code

1 is handled by YARN-7321 in branch-2. This ticket is for handling 2 in 
branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9182) Backport YARN-6445 resource profile performance improvements to branch-2

2019-01-07 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9182:
---

 Summary: Backport YARN-6445 resource profile performance 
improvements to branch-2
 Key: YARN-9182
 URL: https://issues.apache.org/jira/browse/YARN-9182
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Hung
 Attachments: YARN-9182-YARN-8200.001.patch





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9181) Backport YARN-6232 for generic resource type usage to branch-2

2019-01-07 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9181:
---

 Summary: Backport YARN-6232 for generic resource type usage to 
branch-2
 Key: YARN-9181
 URL: https://issues.apache.org/jira/browse/YARN-9181
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Hung






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9180) Port YARN-7033 NM recovery of assigned resources to branch-3.0/branch-2

2019-01-07 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-9180:
---

 Summary: Port YARN-7033 NM recovery of assigned resources to 
branch-3.0/branch-2
 Key: YARN-9180
 URL: https://issues.apache.org/jira/browse/YARN-9180
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Hung
Assignee: Jonathan Hung






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



  1   2   >