[jira] [Created] (YARN-6361) FSLeafQueue.fetchAppsWithDemand CPU usage is high with big queues

2017-03-16 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6361:


 Summary: FSLeafQueue.fetchAppsWithDemand CPU usage is high with 
big queues
 Key: YARN-6361
 URL: https://issues.apache.org/jira/browse/YARN-6361
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Priority: Minor


FSLeafQueue.fetchAppsWithDemand sorts the applications by the current policy. 
Most of the time is spent in FairShareComparator.compare. We could improve this 
by doing the calculations outside the sort loop (O(n)) and we sorted by a fixed 
number inside instead O(n*log(n)).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6360) Prevent FS state dump logger from inheriting parents' appenders.

2017-03-16 Thread Yufei Gu (JIRA)
Yufei Gu created YARN-6360:
--

 Summary: Prevent FS state dump logger from inheriting parents' 
appenders.
 Key: YARN-6360
 URL: https://issues.apache.org/jira/browse/YARN-6360
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 3.0.0-alpha2, 2.9.0
Reporter: Yufei Gu
Assignee: Yufei Gu


FS could dump states to multiple files if its logger inherit parents' appender. 
We should prevent that in case the state dump logger may cram other log files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6359) TestRM#testApplicationKillAtAcceptedState fails rarely due to race condition

2017-03-16 Thread Robert Kanter (JIRA)
Robert Kanter created YARN-6359:
---

 Summary: TestRM#testApplicationKillAtAcceptedState fails rarely 
due to race condition
 Key: YARN-6359
 URL: https://issues.apache.org/jira/browse/YARN-6359
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.9.0, 3.0.0-alpha3
Reporter: Robert Kanter
Assignee: Robert Kanter


We've seen (very rarely) a test failure in 
{{TestRM#testApplicationKillAtAcceptedState}}

{noformat}
java.lang.AssertionError: expected:<1> but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRM.testApplicationKillAtAcceptedState(TestRM.java:645)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6358) Cache the resolved hosts prevent calls to InetAddress.getByName and normalizeHost

2017-03-16 Thread Jose Miguel Arreola (JIRA)
Jose Miguel Arreola created YARN-6358:
-

 Summary: Cache the resolved hosts prevent calls to 
InetAddress.getByName and normalizeHost
 Key: YARN-6358
 URL: https://issues.apache.org/jira/browse/YARN-6358
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, security
Reporter: Jose Miguel Arreola


When running performance tests, we noticed that a lot of time is taken in 
resolving the host address.
In our specific scenario, we saw the function 
org.apache.hadoop.security.SecurityUtil.getInetAddressByName taking a lot of 
time to resolve the hosts, and the same function is called a lot of times.
I saw that org.apache.hadoop.yarn.server.resourcemanager.NodesListManager has a 
cached resolver already because of the same reason.
So, the proposal is, to make this cache generic and use it to save some time in 
this functions that we already know, but have it available so the cache can be 
used anywhere else.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6357) Implement TimelineCollector#putEntitiesAsync

2017-03-16 Thread Joep Rottinghuis (JIRA)
Joep Rottinghuis created YARN-6357:
--

 Summary: Implement TimelineCollector#putEntitiesAsync
 Key: YARN-6357
 URL: https://issues.apache.org/jira/browse/YARN-6357
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: ATSv2, timelineserver
Affects Versions: YARN-2928
Reporter: Joep Rottinghuis
Assignee: Haibo Chen


As discovered and discussed in YARN-5269 the TimelineCollector#putEntitiesAsync 
method is currently not implemented and TimelineCollector#putEntities is 
asynchronous.

TimelineV2ClientImpl#putEntities vs TimelineV2ClientImpl#putEntitiesAsync 
correctly call TimelineEntityDispatcher#dispatchEntities(boolean sync,... with 
the correct argument. This argument does seem to make it into the params, and 
on the server side TimelineCollectorWebService#putEntities correctly pulls the 
async parameter from the rest call. See line 156:
{code}
boolean isAsync = async != null && async.trim().equalsIgnoreCase("true");
{code}
However, this is where the problem starts. It simply calls 
TimelineCollector#putEntities and ignores the value of isAsync. It should 
instead have called TimelineCollector#putEntitiesAsync, which is currently not 
implemented.
putEntities should call putEntitiesAsync and then after that call writer.flush()
The fact that we flush on close and we flush periodically should be more of a 
concern of avoiding data loss; close in case sync is never called and the 
periodic flush to guard against having data from slow writers get buffered for 
a long time and expose us to risk of loss in case the collector crashes with 
data in its buffers. Size-based flush is a different concern to avoid blowing 
up memory footprint.
The spooling behavior is also somewhat separate.
We have two separate methods on our API putEntities and putEntitiesAsync and 
they should have different behavior beyond waiting for the request to be sent. 
I can file a separate bug from this one dealing with exception handling to 
tackle the sync vs async nature. During the meeting today I was thinking about 
the HBase writer that has a flush, which definitely blocks until data is 
flushed to HBase (ignoring the spooling for the moment).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: Two AMs in one YARN container?

2017-03-16 Thread Arun Suresh
Hey Sergiy,

I think a similar approach IIUC, where an AM for a app running on a cluster
acts as an unmanaged AM on another cluster. I believe they use a separate
UGI for each sub-cluster and wrap it around a doAs before the actual
allocate call.

Subru might be able to give more details.

Cheers
-Arun

On Thu, Mar 16, 2017 at 2:34 PM, Jason Lowe 
wrote:

> The doAs method in UserGroupInformation is what you want when dealing with
> multiple UGIs.  It determines what UGI instance the code within the doAs
> scope gets when that code tries to lookup the current user.
> Each AM is designed to run in a separate JVM, so each has some main()-like
> entry point that does everything to setup the AM.  Theoretically all you
> need to do is create two, separate UGIs then use each instance to perform a
> doAs wrapping the invocation of the corresponding AM's entry point.  After
> that, everything that AM does will get the UGI of the doAs invocation as
> the current user.  Since the AMs are running in separate doAs instances
> they will get separate UGIs for the current user and thus separate
> credentials.
> Jason
>
>
> On Thursday, March 16, 2017 4:03 PM, Sergiy Matusevych <
> sergiy.matusev...@gmail.com> wrote:
>
>
>  Hi Jason,
>
> Thanks a lot for your help again! Having two separate UserGroupInformation
> instances is exactly what I had in mind. What I do not understand, though,
> is how to make sure that our second call to .regsiterApplicationMaster()
> will pick the right UserGroupInformation object. I would love to find a way
> that does not involve any changes to the YARN client, but if we have to
> patch it, of course, I agree that we need to have a generic yet minimally
> invasive solution.
> Thank you!Sergiy.
>
>
> On Thu, Mar 16, 2017 at 8:03 AM, Jason Lowe  wrote:
> >
> > I believe a cleaner way to solve this problem is to create two,
> _separate_ UserGroupInformation objects and wrap each AM instances in a UGI
> doAs so they aren't trying to share the same credentials.  This is one
> example of a token bleeding over and causing problems. I suspect trying to
> fix these one-by-one as they pop up is going to be frustrating compared to
> just ensuring the credentials remain separate as if they really were
> running in separate JVMs.
> >
> > Adding Daryn who knows a lot more about the UGI stuff so he can correct
> any misunderstandings on my part.
> >
> > Jason
> >
> >
> > On Wednesday, March 15, 2017 1:11 AM, Sergiy Matusevych <
> sergiy.matusev...@gmail.com> wrote:
> >
> >
> > Hi YARN developers,
> >
> > I have an interesting problem that I think is related to YARN Java
> client.
> > I am trying to launch *two* application masters in one container. To be
> > more specific, I am starting a Spark job on YARN, and launch an Apache
> REEF
> > Unmanaged AM from the Spark Driver.
> >
> > Technically, YARN Resource Manager should not care which process each AM
> > runs in. However, there is a problem with the YARN Java client
> > implementation: there is a global UserGroupInformation object that holds
> > the user credentials of the current RM session. This data structure is
> > shared by all AMs, and when REEF application tries to register the second
> > (unmanaged) AM, the client library presents to YARN RM all credentials,
> > including the security token of the first (managed) AM. YARN rejects such
> > registration request, throwing InvalidApplicationMasterRequestException
> > "Application Master is already registered".
> >
> > I feel like this issue can be resolved by a relatively small update to
> the
> > YARN Java client - e.g. by introducing a new variant of the
> > AMRMClientAsync.registerApplicationMaster() that would take the required
> > security token (instead of getting it implicitly from the
> > UserGroupInformation.getCurrentUser().getCredentials() etc.), or having
> > some sort of RM session class that would wrap all data that is currently
> > global. I need to think about the elegant API for it.
> >
> > What do you guys think? I would love to work on this problem and send
> you a
> > pull request for the upcoming 2.9 release.
> >
> > Cheers,
> > Sergiy.
> >
> >
>
>
>
>


[jira] [Created] (YARN-6356) Allow different values of yarn.log-aggregation.retain-seconds for succeeded and failed jobs

2017-03-16 Thread Robert Kanter (JIRA)
Robert Kanter created YARN-6356:
---

 Summary: Allow different values of 
yarn.log-aggregation.retain-seconds for succeeded and failed jobs
 Key: YARN-6356
 URL: https://issues.apache.org/jira/browse/YARN-6356
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: log-aggregation
Reporter: Robert Kanter


It would be useful to have a value of {{yarn.log-aggregation.retain-seconds}} 
for succeeded jobs and a different value for failed/killed jobs.  For jobs that 
succeeded, you typically don't care about the logs, so a shorter retention time 
is fine (and saves space/blocks in HDFS).  For jobs that failed or were killed, 
the logs are much more important, and it's likely to want to keep them around 
for longer so you have time to look at them.

For instance, you could set it to keep logs for succeeded jobs for 1 day and 
logs for failed/killed jobs for 1 week.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: Two AMs in one YARN container?

2017-03-16 Thread Jason Lowe
The doAs method in UserGroupInformation is what you want when dealing with 
multiple UGIs.  It determines what UGI instance the code within the doAs scope 
gets when that code tries to lookup the current user.
Each AM is designed to run in a separate JVM, so each has some main()-like 
entry point that does everything to setup the AM.  Theoretically all you need 
to do is create two, separate UGIs then use each instance to perform a doAs 
wrapping the invocation of the corresponding AM's entry point.  After that, 
everything that AM does will get the UGI of the doAs invocation as the current 
user.  Since the AMs are running in separate doAs instances they will get 
separate UGIs for the current user and thus separate credentials.
Jason
 

On Thursday, March 16, 2017 4:03 PM, Sergiy Matusevych 
 wrote:
 

 Hi Jason,

Thanks a lot for your help again! Having two separate UserGroupInformation 
instances is exactly what I had in mind. What I do not understand, though, is 
how to make sure that our second call to .regsiterApplicationMaster() will pick 
the right UserGroupInformation object. I would love to find a way that does not 
involve any changes to the YARN client, but if we have to patch it, of course, 
I agree that we need to have a generic yet minimally invasive solution.
Thank you!Sergiy.


On Thu, Mar 16, 2017 at 8:03 AM, Jason Lowe  wrote:
>
> I believe a cleaner way to solve this problem is to create two, _separate_ 
> UserGroupInformation objects and wrap each AM instances in a UGI doAs so they 
> aren't trying to share the same credentials.  This is one example of a token 
> bleeding over and causing problems. I suspect trying to fix these one-by-one 
> as they pop up is going to be frustrating compared to just ensuring the 
> credentials remain separate as if they really were running in separate JVMs.
>
> Adding Daryn who knows a lot more about the UGI stuff so he can correct any 
> misunderstandings on my part.
>
> Jason
>
>
> On Wednesday, March 15, 2017 1:11 AM, Sergiy Matusevych 
>  wrote:
>
>
> Hi YARN developers,
>
> I have an interesting problem that I think is related to YARN Java client.
> I am trying to launch *two* application masters in one container. To be
> more specific, I am starting a Spark job on YARN, and launch an Apache REEF
> Unmanaged AM from the Spark Driver.
>
> Technically, YARN Resource Manager should not care which process each AM
> runs in. However, there is a problem with the YARN Java client
> implementation: there is a global UserGroupInformation object that holds
> the user credentials of the current RM session. This data structure is
> shared by all AMs, and when REEF application tries to register the second
> (unmanaged) AM, the client library presents to YARN RM all credentials,
> including the security token of the first (managed) AM. YARN rejects such
> registration request, throwing InvalidApplicationMasterRequestException
> "Application Master is already registered".
>
> I feel like this issue can be resolved by a relatively small update to the
> YARN Java client - e.g. by introducing a new variant of the
> AMRMClientAsync.registerApplicationMaster() that would take the required
> security token (instead of getting it implicitly from the
> UserGroupInformation.getCurrentUser().getCredentials() etc.), or having
> some sort of RM session class that would wrap all data that is currently
> global. I need to think about the elegant API for it.
>
> What do you guys think? I would love to work on this problem and send you a
> pull request for the upcoming 2.9 release.
>
> Cheers,
> Sergiy.
>
>


   

[jira] [Created] (YARN-6355) Interceptor framework for the YARN ApplicationMasterService

2017-03-16 Thread Arun Suresh (JIRA)
Arun Suresh created YARN-6355:
-

 Summary: Interceptor framework for the YARN 
ApplicationMasterService
 Key: YARN-6355
 URL: https://issues.apache.org/jira/browse/YARN-6355
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Arun Suresh
Assignee: Arun Suresh


Currently on the NM, we have the {{AMRMProxy}} framework to intercept the AM 
<-> RM communication and enforce policies. This is used both by YARN federation 
(YARN-2915) as well as Distributed Scheduling (YARN-2877).

This JIRA proposes to introduce a similar framework on the the RM side, so that 
pluggable policies can be enforced on ApplicationMasterService centrally as 
well.

This would be similar in spirit to a Java Servlet Filter Chain. Where the order 
of the interceptors can declared externally.

Once possible usecase would be:
the {{OpportunisticContainerAllocatorAMService}} is implemented as a wrapper 
over the {{ApplicationMasterService}}. It would probably be better to implement 
it as an Interceptor.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: Two AMs in one YARN container?

2017-03-16 Thread Sergiy Matusevych
Hi Jason,

Thanks a lot for your help again! Having two separate UserGroupInformation
instances is exactly what I had in mind. What I do not understand, though,
is how to make sure that our second call to .regsiterApplicationMaster()
will pick the right UserGroupInformation object. I would love to find a way
that does not involve any changes to the YARN client, but if we have to
patch it, of course, I agree that we need to have a generic yet minimally
invasive solution.

Thank you!
Sergiy.


On Thu, Mar 16, 2017 at 8:03 AM, Jason Lowe  wrote:
>
> I believe a cleaner way to solve this problem is to create two,
_separate_ UserGroupInformation objects and wrap each AM instances in a UGI
doAs so they aren't trying to share the same credentials.  This is one
example of a token bleeding over and causing problems. I suspect trying to
fix these one-by-one as they pop up is going to be frustrating compared to
just ensuring the credentials remain separate as if they really were
running in separate JVMs.
>
> Adding Daryn who knows a lot more about the UGI stuff so he can correct
any misunderstandings on my part.
>
> Jason
>
>
> On Wednesday, March 15, 2017 1:11 AM, Sergiy Matusevych <
sergiy.matusev...@gmail.com> wrote:
>
>
> Hi YARN developers,
>
> I have an interesting problem that I think is related to YARN Java client.
> I am trying to launch *two* application masters in one container. To be
> more specific, I am starting a Spark job on YARN, and launch an Apache
REEF
> Unmanaged AM from the Spark Driver.
>
> Technically, YARN Resource Manager should not care which process each AM
> runs in. However, there is a problem with the YARN Java client
> implementation: there is a global UserGroupInformation object that holds
> the user credentials of the current RM session. This data structure is
> shared by all AMs, and when REEF application tries to register the second
> (unmanaged) AM, the client library presents to YARN RM all credentials,
> including the security token of the first (managed) AM. YARN rejects such
> registration request, throwing InvalidApplicationMasterRequestException
> "Application Master is already registered".
>
> I feel like this issue can be resolved by a relatively small update to the
> YARN Java client - e.g. by introducing a new variant of the
> AMRMClientAsync.registerApplicationMaster() that would take the required
> security token (instead of getting it implicitly from the
> UserGroupInformation.getCurrentUser().getCredentials() etc.), or having
> some sort of RM session class that would wrap all data that is currently
> global. I need to think about the elegant API for it.
>
> What do you guys think? I would love to work on this problem and send you
a
> pull request for the upcoming 2.9 release.
>
> Cheers,
> Sergiy.
>
>


[jira] [Resolved] (YARN-4590) SLS(Scheduler Load Simulator) web pages can't load css and js resource

2017-03-16 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu resolved YARN-4590.

Resolution: Duplicate

> SLS(Scheduler Load Simulator) web pages can't load css and js resource 
> ---
>
> Key: YARN-4590
> URL: https://issues.apache.org/jira/browse/YARN-4590
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: xupeng
>Priority: Minor
>
> HadoopVersion : 2.6.0 / with patch YARN-4367-branch-2
> 1. run command "./slsrun.sh 
> --input-rumen=../sample-data/2jobs2min-rumen-jh.json 
> --output-dir=../sample-data/"
> success
> 2. open web page "http://10.6.128.88:10001/track; 
> can not load css and js resource 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6354) RM fails to upgrade to 2.8 with leveldb state store

2017-03-16 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-6354:


 Summary: RM fails to upgrade to 2.8 with leveldb state store
 Key: YARN-6354
 URL: https://issues.apache.org/jira/browse/YARN-6354
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.8.0
Reporter: Jason Lowe
Priority: Critical


When trying to upgrade an RM to 2.8 it fails with a 
StringIndexOutOfBoundsException trying to load reservation state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6353) Clean up OrderingPolicy javadoc

2017-03-16 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6353:
--

 Summary: Clean up OrderingPolicy javadoc
 Key: YARN-6353
 URL: https://issues.apache.org/jira/browse/YARN-6353
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.8.0
Reporter: Daniel Templeton
Assignee: Daniel Templeton
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/ppc64le

2017-03-16 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/259/

[Mar 15, 2017 6:48:17 PM] (junping_du) YARN-6294. ATS client should better 
handle Socket closed case.
[Mar 15, 2017 7:28:22 PM] (arp) HDFS-11419. DFSTopologyNodeImpl#chooseRandom 
optimizations. Contributed
[Mar 16, 2017 1:01:45 AM] (arp) HDFS-11511. Support Timeout when checking 
single disk. Contributed by
[Mar 16, 2017 4:59:55 AM] (jianhe) YARN-6332. Make RegistrySecurity use short 
user names for ZK ACLs.




-1 overall


The following subsystems voted -1:
compile unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc javac


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed junit tests :

   hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer 
   hadoop.hdfs.tools.TestDFSAdminWithHA 
   hadoop.hdfs.server.mover.TestMover 
   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting 
   hadoop.hdfs.server.namenode.web.resources.TestWebHdfsDataLocality 
   hadoop.hdfs.web.TestWebHdfsTimeouts 
   hadoop.yarn.server.timeline.TestRollingLevelDB 
   hadoop.yarn.server.timeline.TestTimelineDataManager 
   hadoop.yarn.server.timeline.TestLeveldbTimelineStore 
   hadoop.yarn.server.timeline.webapp.TestTimelineWebServices 
   hadoop.yarn.server.timeline.recovery.TestLeveldbTimelineStateStore 
   hadoop.yarn.server.timeline.TestRollingLevelDBTimelineStore 
   
hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryServer 
   hadoop.yarn.server.resourcemanager.recovery.TestLeveldbRMStateStore 
   
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption 
   hadoop.yarn.server.TestMiniYarnClusterNodeUtilization 
   hadoop.yarn.server.TestContainerManagerSecurity 
   hadoop.yarn.client.api.impl.TestAMRMClient 
   hadoop.yarn.server.timeline.TestLevelDBCacheTimelineStore 
   hadoop.yarn.server.timeline.TestOverrideTimelineStoreYarnClient 
   hadoop.yarn.server.timeline.TestEntityGroupFSTimelineStore 
   hadoop.yarn.applications.distributedshell.TestDistributedShell 
   hadoop.mapred.TestShuffleHandler 
   hadoop.mapreduce.v2.hs.TestHistoryServerLeveldbStateStoreService 

Timed out junit tests :

   org.apache.hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean 
   org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache 
  

   compile:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/259/artifact/out/patch-compile-root.txt
  [132K]

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/259/artifact/out/patch-compile-root.txt
  [132K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/259/artifact/out/patch-compile-root.txt
  [132K]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/259/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [240K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/259/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
  [16K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/259/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt
  [52K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/259/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
  [72K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/259/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt
  [324K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/259/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/259/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timeline-pluginstorage.txt
  [28K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/259/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-applications-distributedshell.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/259/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-ui.txt
  [8.0K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/259/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-shuffle.txt
  [8.0K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/259/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs.txt
  [16K]
   

Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2017-03-16 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/347/

[Mar 15, 2017 9:18:05 AM] (sunilg) YARN-6328. Fix a spelling mistake in 
CapacityScheduler. Contributed by
[Mar 15, 2017 10:05:03 AM] (yqlin) HDFS-11420. Edit file should not be 
processed by the same type processor
[Mar 15, 2017 10:24:09 AM] (rohithsharmaks) YARN-6336. Jenkins report YARN new 
UI build failure. Contributed by
[Mar 15, 2017 6:48:17 PM] (junping_du) YARN-6294. ATS client should better 
handle Socket closed case.
[Mar 15, 2017 7:28:22 PM] (arp) HDFS-11419. DFSTopologyNodeImpl#chooseRandom 
optimizations. Contributed
[Mar 16, 2017 1:01:45 AM] (arp) HDFS-11511. Support Timeout when checking 
single disk. Contributed by
[Mar 16, 2017 4:59:55 AM] (jianhe) YARN-6332. Make RegistrySecurity use short 
user names for ZK ACLs.




-1 overall


The following subsystems voted -1:
asflicense unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed junit tests :

   hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength 
   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure 
   hadoop.yarn.server.nodemanager.containermanager.TestContainerManager 
   hadoop.yarn.server.timeline.webapp.TestTimelineWebServices 
   hadoop.yarn.server.TestContainerManagerSecurity 
   hadoop.yarn.server.TestMiniYarnClusterNodeUtilization 
   hadoop.yarn.server.TestDiskFailures 
   hadoop.yarn.client.api.impl.TestAMRMClient 
   hadoop.mapreduce.v2.app.job.impl.TestJobImpl 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/347/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/347/artifact/out/diff-compile-javac-root.txt
  [180K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/347/artifact/out/diff-checkstyle-root.txt
  [17M]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/347/artifact/out/diff-patch-pylint.txt
  [20K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/347/artifact/out/diff-patch-shellcheck.txt
  [24K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/347/artifact/out/diff-patch-shelldocs.txt
  [12K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/347/artifact/out/whitespace-eol.txt
  [11M]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/347/artifact/out/whitespace-tabs.txt
  [1.3M]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/347/artifact/out/diff-javadoc-javadoc-root.txt
  [2.2M]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/347/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [272K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/347/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
  [36K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/347/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/347/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt
  [324K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/347/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/347/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app.txt
  [20K]

   asflicense:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/347/artifact/out/patch-asflicense-problems.txt
  [4.0K]

Powered by Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org



-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-3767) Yarn Scheduler Load Simulator does not work

2017-03-16 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino resolved YARN-3767.

Resolution: Won't Fix

>From my read of the conversation, I think this is not an actual issue. I will 
>close it  for now, please re-open if you disagree.

> Yarn Scheduler Load Simulator does not work
> ---
>
> Key: YARN-3767
> URL: https://issues.apache.org/jira/browse/YARN-3767
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
> Environment: OS X 10.10.  JDK 1.7
>Reporter: David Kjerrumgaard
>
> Running the SLS, as per the instructions on the web results in a 
> NullPointerException being thrown.
> Steps followed to create error:
> 1) Download Apache Hadoop 2.7.0 tarball from Apache site
> 2) Untar 2.7.0 tarball into /opt directory
> 3) Execute the following command: 
> /opt/hadoop-2.7.0/share/hadoop/tools/sls//bin/slsrun.sh 
> --input-rumen=/opt/hadoop-2.7.0/share/hadoop/tools/sls/sample-data/2jobs2min-rumen-jh.json
>  --output-dir=/tmp
> Results in the following error:
> 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2118.smile.com:2 Node Transitioned 
> from NEW to RUNNING
> 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node 
> a2118.smile.com:2 clusterResource: 
> 15/06/04 10:25:41 INFO util.RackResolver: Resolved a2115.smile.com to 
> /default-rack
> 15/06/04 10:25:41 INFO resourcemanager.ResourceTrackerService: NodeManager 
> from node a2115.smile.com(cmPort: 3 httpPort: 80) registered with capability: 
> , assigned nodeId a2115.smile.com:3
> 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2115.smile.com:3 Node Transitioned 
> from NEW to RUNNING
> 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node 
> a2115.smile.com:3 clusterResource: 
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
>   at 
> org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:398)
>   at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:250)
>   at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:145)
>   at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:528)
> Caused by: java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:126)
>   ... 4 more



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6352) Header injections are possible in the application proxy servlet

2017-03-16 Thread Naganarasimha G R (JIRA)
Naganarasimha G R created YARN-6352:
---

 Summary: Header injections are possible in the application proxy 
servlet
 Key: YARN-6352
 URL: https://issues.apache.org/jira/browse/YARN-6352
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: headerInjection.png

This issue was found in WVS security tool. 




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6351) Have RM match relaxedLocality request via time instead of missedOpportunities

2017-03-16 Thread Roni Burd (JIRA)
Roni Burd created YARN-6351:
---

 Summary: Have RM match relaxedLocality request via time instead of 
missedOpportunities 
 Key: YARN-6351
 URL: https://issues.apache.org/jira/browse/YARN-6351
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, yarn
Reporter: Roni Burd


When using relaxLocality=true, the current CapacityScheduler strategy is to 
wait a certain amount of missedOpportunities to schedule a request in a node, a 
rack or off_switch. This means that the missedOpportunities param is dependent 
on the number of nodes in the cluster and the duration of each container.  

A different strategy would be to wait a configurable amount of time before 
deciding to go to different location. 

This JIRA proposal is to extract the current behavior into a pluggable strategy 
pattern and create a new strategy that is simply based on time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6350) Add JMX counters to track locality matching (node, rack, off_switch)

2017-03-16 Thread Roni Burd (JIRA)
Roni Burd created YARN-6350:
---

 Summary: Add JMX counters to track locality matching (node, rack, 
off_switch)
 Key: YARN-6350
 URL: https://issues.apache.org/jira/browse/YARN-6350
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: metrics, yarn
Reporter: Roni Burd
Priority: Minor


When using realxLocality=true, it would be nice to have metrics to see how well 
the RM is fulfilling the requests. This helps to tune the relaxLocality params 
and compare the behavior.

The proposal is to have 3 metrics exposed via JMX
-node matching % 
-rack matching %
-off_switch matching % 

Each one represents the matching that occurred compared to the total matched 
asked. 

The metrics would have to take into account the type of request (e.g node, ANY, 
etc)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.8.0 (RC2)

2017-03-16 Thread Junping Du
Thanks Steve. That's Awesome! I will kick off a new RC soon.
Shall we reopen HDFS-6200 given issues here? Making it in release note of 2.8.0 
could confuse people as it doesn't work in HA deployment.

Thanks,

Junping

From: Steve Loughran
Sent: Thursday, March 16, 2017 7:27 AM
To: Junping Du
Cc: common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
yarn-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org
Subject: Re: [VOTE] Release Apache Hadoop 2.8.0 (RC2)

> On 16 Mar 2017, at 00:25, Junping Du  wrote:
>
> bq. From my read of the poms, hadoop-client depends on hadoop-hdfs-client to 
> pull in HDFS-related code. It doesn't have its own dependency on hadoop-hdfs. 
> So I think this affects users of the hadoop-client artifact, which has 
> existed for a long time.
>
> I could miss that. Thanks for reminding! From my quick check: 
> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client/2.7.3?, it 
> sounds like 669 artifacts from other projects were depending on it.
>
>
> I think we should withdraw the current RC bits. Please stop the verification 
> & vote.
>
> I will kick off another RC immediately when HDFS-11431 get fixed.

is done. hadoop-hdfs without any server-side dependencies is now a 
hadoop-client dependency.

Release notes:

The hadoop-client POM now includes a leaner hdfs-client, stripping out all the 
transitive dependencies on JARs only needed for the Hadoop HDFS daemon itself. 
The specific jars now excluded are: leveldbjni-all, jetty-util, commons-daemon, 
xercesImpl, netty and servlet-api.

This should make downstream projects dependent JARs smaller, and avoid version 
conflict problems with the specific JARs now excluded.

Applications may encounter build problems if they did depend on these JARs, and 
which didn't explicitly include them. There are two fixes for this

* explicitly include the JARs, stating which version of them you want.
* add a dependency on hadoop-hdfs. For Hadoop 2.8+, this will add the missing 
dependencies. For builds against older versions of Hadoop, this will be 
harmless, as hadoop-hdfs and all its dependencies are already pulled in by the 
hadoop-client POM.




-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: Two AMs in one YARN container?

2017-03-16 Thread Jason Lowe
I believe a cleaner way to solve this problem is to create two, _separate_ 
UserGroupInformation objects and wrap each AM instances in a UGI doAs so they 
aren't trying to share the same credentials.  This is one example of a token 
bleeding over and causing problems. I suspect trying to fix these one-by-one as 
they pop up is going to be frustrating compared to just ensuring the 
credentials remain separate as if they really were running in separate JVMs.
Adding Daryn who knows a lot more about the UGI stuff so he can correct any 
misunderstandings on my part.
Jason
 

On Wednesday, March 15, 2017 1:11 AM, Sergiy Matusevych 
 wrote:
 

 Hi YARN developers,

I have an interesting problem that I think is related to YARN Java client.
I am trying to launch *two* application masters in one container. To be
more specific, I am starting a Spark job on YARN, and launch an Apache REEF
Unmanaged AM from the Spark Driver.

Technically, YARN Resource Manager should not care which process each AM
runs in. However, there is a problem with the YARN Java client
implementation: there is a global UserGroupInformation object that holds
the user credentials of the current RM session. This data structure is
shared by all AMs, and when REEF application tries to register the second
(unmanaged) AM, the client library presents to YARN RM all credentials,
including the security token of the first (managed) AM. YARN rejects such
registration request, throwing InvalidApplicationMasterRequestException
"Application Master is already registered".

I feel like this issue can be resolved by a relatively small update to the
YARN Java client - e.g. by introducing a new variant of the
AMRMClientAsync.registerApplicationMaster() that would take the required
security token (instead of getting it implicitly from the
UserGroupInformation.getCurrentUser().getCredentials() etc.), or having
some sort of RM session class that would wrap all data that is currently
global. I need to think about the elegant API for it.

What do you guys think? I would love to work on this problem and send you a
pull request for the upcoming 2.9 release.

Cheers,
Sergiy.


   

[jira] [Created] (YARN-6349) Container kill request from AM can be lost if container is still recovering

2017-03-16 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-6349:


 Summary: Container kill request from AM can be lost if container 
is still recovering
 Key: YARN-6349
 URL: https://issues.apache.org/jira/browse/YARN-6349
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Jason Lowe


If container recovery takes an excessive amount of time (e.g.: HDFS is slow) 
then the NM could start servicing requests before all containers have 
recovered.  If an AM tries to kill a container while it is still recovering 
then this kill request could be lost.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.8.0 (RC2)

2017-03-16 Thread Steve Loughran

> On 16 Mar 2017, at 00:25, Junping Du  wrote:
> 
> bq. From my read of the poms, hadoop-client depends on hadoop-hdfs-client to 
> pull in HDFS-related code. It doesn't have its own dependency on hadoop-hdfs. 
> So I think this affects users of the hadoop-client artifact, which has 
> existed for a long time.
> 
> I could miss that. Thanks for reminding! From my quick check: 
> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client/2.7.3?, it 
> sounds like 669 artifacts from other projects were depending on it.
> 
> 
> I think we should withdraw the current RC bits. Please stop the verification 
> & vote.
> 
> I will kick off another RC immediately when HDFS-11431 get fixed.

is done. hadoop-hdfs without any server-side dependencies is now a 
hadoop-client dependency.

Release notes:

The hadoop-client POM now includes a leaner hdfs-client, stripping out all the 
transitive dependencies on JARs only needed for the Hadoop HDFS daemon itself. 
The specific jars now excluded are: leveldbjni-all, jetty-util, commons-daemon, 
xercesImpl, netty and servlet-api.

This should make downstream projects dependent JARs smaller, and avoid version 
conflict problems with the specific JARs now excluded.

Applications may encounter build problems if they did depend on these JARs, and 
which didn't explicitly include them. There are two fixes for this

* explicitly include the JARs, stating which version of them you want.
* add a dependency on hadoop-hdfs. For Hadoop 2.8+, this will add the missing 
dependencies. For builds against older versions of Hadoop, this will be 
harmless, as hadoop-hdfs and all its dependencies are already pulled in by the 
hadoop-client POM.




-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.8.0 (RC2)

2017-03-16 Thread Kuhu Shukla
+1 (non-binding)-Downloaded source.-Verified signatures.- Compiled the 
source.-Ran sample jobs like MR sleep on pseudo distributed cluster. (Mac OS)
Thanks Junping and others!Regards,Kuhu
On Wednesday, March 15, 2017, 7:25:46 PM CDT, Junping Du  
wrote:bq. From my read of the poms, hadoop-client depends on hadoop-hdfs-client 
to pull in HDFS-related code. It doesn't have its own dependency on 
hadoop-hdfs. So I think this affects users of the hadoop-client artifact, which 
has existed for a long time.

I could miss that. Thanks for reminding! From my quick check: 
https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client/2.7.3?, it 
sounds like 669 artifacts from other projects were depending on it.


I think we should withdraw the current RC bits. Please stop the verification & 
vote.

I will kick off another RC immediately when HDFS-11431 get fixed.


Thanks,


Junping



From: Andrew Wang 
Sent: Wednesday, March 15, 2017 2:04 PM
To: Junping Du
Cc: common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
yarn-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org
Subject: Re: [VOTE] Release Apache Hadoop 2.8.0 (RC2)

Hi Junping, inline,


>From my understanding, this issue is related to our previous improvements with 
>separating client and server jars in HDFS-6200. If we use the new "client" jar 
>in NN HA deployment, then we will hit the issue reported.

>From my read of the poms, hadoop-client depends on hadoop-hdfs-client to pull 
>in HDFS-related code. It doesn't have its own dependency on hadoop-hdfs. So I 
>think this affects users of the hadoop-client artifact, which has existed for 
>a long time.

Essentially all of our customer deployments run with NN HA, so this would 
affect a lot of users.

I can see two options here:

- Without any change in 2.8.0, if user hit the issue when they deploy HA 
cluster by using new client jar, adding back hdfs jar just like how things work 
previously

- Make the change now in 2.8.0, either moving ConfiguredFailoverProxyProvider 
to client jar or adding dependency between client jar and server jar. There 
must be some arguments there on which way to fix is better especially 
ConfiguredFailoverProxyProvider still has some sever side dependencies.


I would prefer the first option, given:

- The issue fixing time is unpredictable as there are still discussion on how 
to fix this issue. Our 2.8.0 release shouldn't be an endless journey which has 
been deferred several times for more serious issue.

Looks like we have a patch being actively revved and reviewed to fix this by 
making hadoop-hdfs-client depend on hadoop-hdfs. Thanks to Steven and Steve for 
working on this.

Steve proposed doing a proper split in a later JIRA.

- We have workaround for this improvement, no regression happens due to this 
issue. People can still use hdfs jar in old way. The worst case is improvement 
for HDFS doesn't work in some cases - that shouldn't block the whole release.

Based on the above, I think there is a regression for users of the 
hadoop-client artifact.

If it actually only affects users of hadoop-hdfs-client, then I agree we can 
document it as a Known Issue and fix it later.

Best,
Andrew