
Jonathan Hung edited comment on YARN-8200 at 7/9/18 10:55 PM:

Perf unit test:

2 resources:
20000 14652.015
40000 20876.826
60000 29455.08
80000 45045.047
100000 37735.848
120000 40816.33
140000 46403.71
160000 47169.812
180000 49261.082
200000 48543.688
220000 49140.05
240000 47393.363
260000 48899.754
280000 48899.754
300000 49751.242
320000 50125.312
340000 46296.297
360000 48780.49
380000 47961.63
400000 47732.695
420000 47732.695
440000 48076.92
460000 49019.61
480000 46728.973
500000 42643.92
520000 46296.297
540000 48426.15
560000 49504.95
580000 47846.89
600000 48543.688
620000 47393.363
640000 48899.754
660000 48661.8
680000 49140.05
700000 49019.61
720000 48780.49
740000 48899.754
760000 49382.715
780000 47393.363
800000 48076.92
820000 48192.77
840000 47732.695
860000 50125.312
880000 48899.754
900000 49019.61
920000 48076.92
940000 48192.77
960000 48076.92
980000 42553.19
1000000 47846.89
1020000 47846.89
1040000 48780.49
1060000 47961.63
1080000 49140.05
1100000 47169.812
1120000 47846.89
1140000 47619.047
1160000 47619.047
1180000 49875.312
1200000 47619.047
1220000 47393.363
1240000 47505.938
1260000 48899.754
1280000 48780.49
1300000 46189.375
1320000 47505.938
1340000 45871.56
1360000 47619.047
1380000 48543.688
1400000 47619.047
1420000 48076.92
1440000 48076.92
1460000 47732.695
1480000 47281.324
1500000 48543.688
1520000 48661.8
1540000 47393.363
1560000 48543.688
1580000 47961.63
1600000 46296.297
1620000 47846.89
1640000 47846.89
1660000 48543.688
1680000 47505.938
1700000 47281.324
1720000 48309.18
1740000 48309.18
1760000 50000.0
1780000 47505.938
1800000 48192.77
1820000 48192.77
1840000 48309.18
1860000 48543.688
1880000 48661.8
1900000 48192.77
1920000 47846.89
1940000 42105.26
1960000 48899.754
1980000 47961.63
#ResourceTypes = 2. Avg of fastest 20: 49382.715
2018-06-26 17:12:59,756 ERROR [Thread[Thread-11,5,main]] 
(AbstractDelegationTokenSecretManager.java:run(696)) - ExpiredTokenRemover 
received java.lang.InterruptedException: sleep interrupted{noformat}
3 resources:
20000 10964.912
40000 15760.441
60000 26990.553
80000 24752.475
100000 32733.225
120000 30487.805
140000 27397.26
160000 35778.176
180000 33112.582
200000 34843.207
220000 31347.963
240000 37383.176
260000 34482.758
280000 39062.5
300000 38095.24
320000 35842.293
340000 32154.342
360000 39447.73
380000 37878.79
400000 38240.918
420000 36101.082
440000 38167.938
460000 38834.953
480000 38022.812
500000 38610.04
520000 37105.75
540000 38610.04
560000 39215.688
580000 38022.812
600000 39215.688
620000 37950.664
640000 39138.94
660000 37735.848
680000 38684.72
700000 38986.355
720000 37735.848
740000 37243.95
760000 38535.645
780000 37807.184
800000 38314.176
820000 36900.367
840000 38610.04
860000 39370.08
880000 38314.176
900000 39525.69
920000 38461.54
940000 39761.43
960000 39370.08
980000 38910.504
1000000 38022.812
1020000 39138.94
1040000 38314.176
1060000 39292.73
1080000 39292.73
1100000 39370.08
1120000 39292.73
1140000 38314.176
1160000 39840.637
1180000 39062.5
1200000 39370.08
1220000 37950.664
1240000 39062.5
1260000 37664.785
1280000 38684.72
1300000 38986.355
1320000 39525.69
1340000 40322.582
1360000 39292.73
1380000 37664.785
1400000 39525.69
1420000 39138.94
1440000 39370.08
1460000 39840.637
1480000 37037.035
1500000 38387.715
1520000 39525.69
1540000 37523.453
1560000 39603.96
1580000 36764.707
1600000 32362.459
1620000 29542.098
1640000 31250.0
1660000 29112.082
1680000 32000.0
1700000 27662.518
1720000 27100.271
1740000 26845.637
1760000 33388.98
1780000 35714.285
1800000 31152.648
1820000 36832.414
1840000 35650.625
1860000 38461.54
1880000 34662.047
1900000 31104.2
1920000 32573.29
1940000 36900.367
1960000 26702.27
1980000 30211.48
#ResourceTypes = 3. Avg of fastest 20: 39525.69
2018-06-26 17:16:14,530 ERROR [Thread[Thread-11,5,main]] 
(AbstractDelegationTokenSecretManager.java:run(696)) - ExpiredTokenRemover 
received java.lang.InterruptedException: sleep interrupted
4 resources:
20000 13166.557
40000 21299.254
60000 33222.59
80000 37174.723
100000 32786.887
120000 33955.855
140000 38095.24
160000 37243.95
180000 38167.938
200000 37807.184
220000 36900.367
240000 39370.08
260000 36563.07
280000 38240.918
300000 38759.69
320000 39370.08
340000 35523.98
360000 39370.08
380000 38610.04
400000 38759.69
420000 39603.96
440000 37878.79
460000 38910.504
480000 38684.72
500000 39682.54
520000 38461.54
540000 38535.645
560000 37105.75
580000 38910.504
600000 38095.24
620000 38684.72
640000 38910.504
660000 39138.94
680000 39292.73
700000 38095.24
720000 39215.688
740000 39447.73
760000 39447.73
780000 40000.0
800000 38759.69
820000 38910.504
840000 39603.96
860000 38834.953
880000 38610.04
900000 38167.938
920000 39138.94
940000 38684.72
960000 39447.73
980000 38240.918
1000000 40000.0
1020000 38314.176
1040000 38834.953
1060000 38022.812
1080000 36697.246
1100000 36968.58
1120000 38834.953
1140000 37383.176
1160000 36231.883
1180000 35778.176
1200000 36900.367
1220000 40000.0
1240000 40080.16
1260000 39215.688
1280000 38910.504
1300000 39682.54
1320000 38387.715
1340000 38387.715
1360000 38834.953
1380000 39447.73
1400000 39215.688
1420000 38910.504
1440000 39138.94
1460000 39761.43
1480000 38314.176
1500000 36297.64
1520000 38986.355
1540000 38314.176
1560000 39138.94
1580000 38834.953
1600000 38167.938
1620000 39603.96
1640000 39062.5
1660000 39761.43
1680000 38759.69
1700000 38910.504
1720000 40000.0
1740000 38759.69
1760000 38759.69
1780000 39138.94
1800000 37950.664
1820000 39447.73
1840000 38759.69
1860000 38986.355
1880000 40567.953
1900000 39682.54
1920000 38610.04
1940000 39215.688
1960000 39447.73
1980000 38759.69
#ResourceTypes = 4. Avg of fastest 20: 39761.43
2018-06-26 17:19:41,570 ERROR [Thread[Thread-11,5,main]] 
(AbstractDelegationTokenSecretManager.java:run(696)) - ExpiredTokenRemover 
received java.lang.InterruptedException: sleep interrupted
5 resources:
20000 10531.858
40000 24390.244
60000 28409.092
80000 36166.367
100000 32414.91
120000 26809.652
140000 37453.184
160000 36697.246
180000 37664.785
200000 32206.12
220000 37878.79
240000 37037.035
260000 37664.785
280000 37243.95
300000 37807.184
320000 35273.367
340000 28011.205
360000 38095.24
380000 37313.434
400000 37807.184
420000 37593.984
440000 37383.176
460000 37664.785
480000 37453.184
500000 37313.434
520000 37735.848
540000 37735.848
560000 37037.035
580000 36900.367
600000 37105.75
620000 34843.207
640000 36297.64
660000 37950.664
680000 37664.785
700000 37453.184
720000 37453.184
740000 35714.285
760000 36563.07
780000 35906.645
800000 36429.87
820000 36036.035
840000 36166.367
860000 37243.95
880000 35971.223
900000 35906.645
920000 36036.035
940000 35587.188
960000 37664.785
980000 37664.785
1000000 37593.984
1020000 37243.95
1040000 38387.715
1060000 37174.723
1080000 38240.918
1100000 37105.75
1120000 36101.082
1140000 36832.414
1160000 36231.883
1180000 37105.75
1200000 38022.812
1220000 36166.367
1240000 36697.246
1260000 37383.176
1280000 36832.414
1300000 38387.715
1320000 37313.434
1340000 37243.95
1360000 37950.664
1380000 36036.035
1400000 38095.24
1420000 37383.176
1440000 37664.785
1460000 37383.176
1480000 37105.75
1500000 37664.785
1520000 37313.434
1540000 37313.434
1560000 37807.184
1580000 36563.07
1600000 36764.707
1620000 38095.24
1640000 37593.984
1660000 36968.58
1680000 36363.637
1700000 34602.074
1720000 34782.61
1740000 33898.305
1760000 35149.387
1780000 34904.016
1800000 36697.246
1820000 36630.035
1840000 38461.54
1860000 37037.035
1880000 37593.984
1900000 38022.812
1920000 37453.184
1940000 37383.176
1960000 37950.664
1980000 36832.414
#ResourceTypes = 5. Avg of fastest 20: 38022.812
2018-06-26 17:22:05,855 ERROR [Thread[Thread-11,5,main]] 
(AbstractDelegationTokenSecretManager.java:run(696)) - ExpiredTokenRemover 
received java.lang.InterruptedException: sleep interrupted

>From 2 to 3 resources, throughput 49.4k to 39.5k (20% drop, similar to the one 
Will work on the SLS tests.

> Backport resource types/GPU features to branch-2
> ------------------------------------------------
>                 Key: YARN-8200
>                 URL: https://issues.apache.org/jira/browse/YARN-8200
>             Project: Hadoop YARN
>          Issue Type: Task
>            Reporter: Jonathan Hung
>            Assignee: Jonathan Hung
>            Priority: Major
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.

