[jira] [Commented] (YARN-2879) Compatibility validation between YARN 2.2/2.4 and 2.6

2014-11-19 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218405#comment-14218405
 ] 

Zhijie Shen commented on YARN-2879:
---

In the following scenarios:

1. Either insecure or secure;
2. MR 2.2 with new shuffle on NM;
3. Submitting via old client.

We will see the following console exception:
{code}
Console Log:
14/11/17 14:56:19 INFO mapreduce.Job: Job job_1416264695865_0003 completed 
successfully
java.lang.IllegalArgumentException: No enum constant 
org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_REDUCES
at java.lang.Enum.valueOf(Enum.java:236)
at 
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.valueOf(FrameworkCounterGroup.java:148)
at 
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.findCounter(FrameworkCounterGroup.java:182)
at 
org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154)
at 
org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:240)
at 
org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:370)
at 
org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:511)
at org.apache.hadoop.mapreduce.Job$7.run(Job.java:756)
at org.apache.hadoop.mapreduce.Job$7.run(Job.java:753)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:753)
at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1361)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
{code}

The problem is supposed to be fixed by MAPREDUCE-5831, however, it seems that 
we haven't cover all the problematic code path. Will another Jira again.

> Compatibility validation between YARN 2.2/2.4 and 2.6
> -
>
> Key: YARN-2879
> URL: https://issues.apache.org/jira/browse/YARN-2879
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Recently, I did some simple backward compatibility experiments. Bascially, 
> I've taken the following 2 steps:
> 1. Deploy the application (MR and DistributedShell) that is compiled against 
> *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
> submitted via *new* Hadoop (2.6) client.
> 2. Deploy the application (MR and DistributedShell) that is compiled against 
> *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
> submitted via *old* Hadoop (2.2/2.4) client that comes with the app.
> I've tried these 2 steps on both insecure and secure cluster. Here's a short 
> summary:
> || || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle and RT 2.2 || MR 2.2 + Shuffle 
> and RT 2.6 || MR 2.4 + Shuffle and RT 2.4 || MR 2.4 + Shuffle and RT 2.6 ||
> | Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible 
> | OK | OK |
> | Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | 
> OK | OK |
> | Secure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
> OK | OK |
> | Secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK 
> | OK |
> Note that I've tried to run NM with both old and new version of shuffle 
> handler plus the runtime libs.
> In general, the compatibility looks good overall. There're a f

[jira] [Updated] (YARN-2879) Compatibility validation between YARN 2.2/2.4 and 2.6

2014-11-19 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2879:
--
Description: 
Recently, I did some simple backward compatibility experiments. Bascially, I've 
taken the following 2 steps:

1. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *new* Hadoop (2.6) client.

2. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *old* Hadoop (2.2/2.4) client that comes with the app.

I've tried these 2 steps on both insecure and secure cluster. Here's a short 
summary:

|| || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle and RT 2.2 || MR 2.2 + Shuffle 
and RT 2.6 || MR 2.4 + Shuffle and RT 2.4 || MR 2.4 + Shuffle and RT 2.6 ||
| Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK 
| OK |
| Secure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| Secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | 
OK |

Note that I've tried to run NM with both old and new version of shuffle handler 
plus the runtime libs.

In general, the compatibility looks good overall. There're a few issues that 
are related to MR, but they seem to be not the YARN issue. I'll post the 
individual problem in the follow-up comments.



  was:
Recently, I did some simple backward compatibility experiments. Bascially, I've 
taken the following 2 steps:

1. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *new* Hadoop (2.6) client.

2. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *old* Hadoop (2.2/2.4) client that comes with the app.

I've tried these 2 steps on both insecure and secure cluster. Here's a short 
summary:

|| || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle 2.2 || MR 2.2 + Shuffle 2.6 || MR 
2.4 + Shuffle 2.4 || MR 2.4 + Shuffle 2.6 ||
| Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK 
| OK |
| Secure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| Secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | 
OK |

Note that I've tried to run NM with both old and new version of shuffle handler 
plus the runtime libs.

In general, the compatibility looks good overall. There're a few issues that 
are related to MR, but they seem to be not the YARN issue. I'll post the 
individual problem in the follow-up comments.




> Compatibility validation between YARN 2.2/2.4 and 2.6
> -
>
> Key: YARN-2879
> URL: https://issues.apache.org/jira/browse/YARN-2879
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Recently, I did some simple backward compatibility experiments. Bascially, 
> I've taken the following 2 steps:
> 1. Deploy the application (MR and DistributedShell) that is compiled against 
> *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
> submitted via *new* Hadoop (2.6) client.
> 2. Deploy the application (MR and DistributedShell) that is compiled against 
> *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
> submitted via *old* Hadoop (2.2/2.4) client that comes with the app.
> I've tried these 2 steps on both insecure and secure cluster. Here's a short 
> summary:
> || || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle and RT 2.2 || MR 2.2 + Shuffle 
> and RT 2.6 || MR 2.4 + Shuffle and RT 2.4 || MR 2.4 + Shuffle and RT 2.6 ||
> | Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible 
> | OK | OK |
> | Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | 
> OK | OK |
> | Secure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
> OK | OK |
> | Secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK 
> | OK |
> Note that I've tried to run NM with both old and new version of shuffle 
> handler plus the runtime libs.
> In general, the compatibility looks good overall. There're a few issues that 
> are related to MR, but they seem to be not the YARN issue. I'll post the 
> individual problem in the follow-up comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2879) Compatibility validation between YARN 2.2/2.4 and 2.6

2014-11-19 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2879:
--
Description: 
Recently, I did some simple backward compatibility experiments. Bascially, I've 
taken the following 2 steps:

1. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *new* Hadoop (2.6) client.

2. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *old* Hadoop (2.2/2.4) client that comes with the app.

I've tried these 2 steps on both insecure and secure cluster. Here's a short 
summary:

|| || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle 2.2 || MR 2.2 + Shuffle 2.6 || MR 
2.4 + Shuffle 2.4 || MR 2.4 + Shuffle 2.6 ||
| Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK 
| OK |
| Secure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| Secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | 
OK |

Note that I've tried to run NM with both old and new version of shuffle handler 
plus the runtime libs.

In general, the compatibility looks good overall. There're a few issues that 
are related to MR, but they seem to be not the YARN issue. I'll post the 
individual problem in the follow-up comments.



  was:
Recently, I did some simple backward compatibility experiments. Bascially, I've 
taken the following 2 steps:

1. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *new* Hadoop (2.6) client.

2. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *old* Hadoop (2.2/2.4) client that comes with the app.

I've tried these 2 steps on both insecure and secure cluster. Here's a short 
summary:

|| || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle 2.2 || MR 2.2 + Shuffle 2.6 || MR 
2.4 + Shuffle 2.4 || MR 2.4 + Shuffle 2.6 ||
| Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK 
| OK |
| secure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | 
OK |

Note that I've tried to run NM with both old and new shuffle handler version.

In general, the compatibility looks good overall. There're a few issues that 
are related to MR, but they seem to be not the YARN issue. I'll post the 
individual problem in the follow-up comments.




> Compatibility validation between YARN 2.2/2.4 and 2.6
> -
>
> Key: YARN-2879
> URL: https://issues.apache.org/jira/browse/YARN-2879
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Recently, I did some simple backward compatibility experiments. Bascially, 
> I've taken the following 2 steps:
> 1. Deploy the application (MR and DistributedShell) that is compiled against 
> *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
> submitted via *new* Hadoop (2.6) client.
> 2. Deploy the application (MR and DistributedShell) that is compiled against 
> *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
> submitted via *old* Hadoop (2.2/2.4) client that comes with the app.
> I've tried these 2 steps on both insecure and secure cluster. Here's a short 
> summary:
> || || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle 2.2 || MR 2.2 + Shuffle 2.6 || 
> MR 2.4 + Shuffle 2.4 || MR 2.4 + Shuffle 2.6 ||
> | Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible 
> | OK | OK |
> | Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | 
> OK | OK |
> | Secure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
> OK | OK |
> | Secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK 
> | OK |
> Note that I've tried to run NM with both old and new version of shuffle 
> handler plus the runtime libs.
> In general, the compatibility looks good overall. There're a few issues that 
> are related to MR, but they seem to be not the YARN issue. I'll post the 
> individual problem in the follow-up comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-2879) Compatibility validation between YARN 2.2/2.4 and 2.6

2014-11-19 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218388#comment-14218388
 ] 

Zhijie Shen edited comment on YARN-2879 at 11/19/14 7:50 PM:
-

a. In the following scenarios:

1. Either insecure or secure;
2. MR 2.2 with either old or new shuffle handler on NM;
3. Submitting via new client.

We will see the following console exception:
{code}
14/11/17 23:47:45 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
/user/zjshen/.staging/zjshen/.staging/job_1416270549965_0014
java.lang.NoSuchMethodError: 
org.apache.hadoop.http.HttpConfig.getSchemePrefix()Ljava/lang/String;
at 
org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:428)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:302)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:430)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)  
{code}

b. In the following scenarios:

1. Either insecure or secure;
2. MR 2.2 with old shuffle on NM;
3. Submitting via old client.

We will see the following exception in the AM Log:
{code}
2014-11-17 15:09:06,157 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
application appattempt_1416264695865_0007_01
2014-11-17 15:09:06,436 FATAL [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.NoSuchMethodError: 
org.apache.hadoop.http.HttpConfig.setPolicy(Lorg/apache/hadoop/http/HttpConfig$Policy;)V
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1364)
2014-11-17 15:09:06,439 INFO [Thread-1] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a signal. 
Signaling RMCommunicator and JobHistoryEventHandler.
{code}

The two exceptions are actually the same problem, but using the old client 
prevents it happening during app submission. Will file a separate Jira for it.


was (Author: zjshen):
a. In the following scenarios:

1. Either insecure or secure;
2. MR 2.2 with either old or new shuffle handler on NM;
3. Submitting via new client.

We will see the following console exception:
{code}
14/11/17 23:47:45 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
/user/zjshen/.staging/zjshen/.staging/job_1416270549965_0014
java.lang.NoSuchMethodError: 
org.apache.hadoop.http.HttpConfig.getSchemePrefix()Ljava/lang/String;
at 
org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:428)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:302)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:430)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.

[jira] [Commented] (YARN-2879) Compatibility validation between YARN 2.2/2.4 and 2.6

2014-11-19 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218388#comment-14218388
 ] 

Zhijie Shen commented on YARN-2879:
---

a. In the following scenarios:

1. Either insecure or secure;
2. MR 2.2 with either old or new shuffle handler on NM;
3. Submitting via new client.

We will see the following console exception:
{code}
14/11/17 23:47:45 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
/user/zjshen/.staging/zjshen/.staging/job_1416270549965_0014
java.lang.NoSuchMethodError: 
org.apache.hadoop.http.HttpConfig.getSchemePrefix()Ljava/lang/String;
at 
org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:428)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:302)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:430)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)  
{code}

b. In the following scenarios:
1. Either insecure or secure;
2. MR 2.2 with old on NM;
3. Submitting via old client.

We will see the following exception in the AM Log:
{code}
2014-11-17 15:09:06,157 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
application appattempt_1416264695865_0007_01
2014-11-17 15:09:06,436 FATAL [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.NoSuchMethodError: 
org.apache.hadoop.http.HttpConfig.setPolicy(Lorg/apache/hadoop/http/HttpConfig$Policy;)V
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1364)
2014-11-17 15:09:06,439 INFO [Thread-1] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a signal. 
Signaling RMCommunicator and JobHistoryEventHandler.
{code}

The two exceptions are actually the same problem, but using the old client 
prevents it happening during app submission. Will file a separate Jira for it.

> Compatibility validation between YARN 2.2/2.4 and 2.6
> -
>
> Key: YARN-2879
> URL: https://issues.apache.org/jira/browse/YARN-2879
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Recently, I did some simple backward compatibility experiments. Bascially, 
> I've taken the following 2 steps:
> 1. Deploy the application (MR and DistributedShell) that is compiled against 
> *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
> submitted via *new* Hadoop (2.6) client.
> 2. Deploy the application (MR and DistributedShell) that is compiled against 
> *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
> submitted via *old* Hadoop (2.2/2.4) client that comes with the app.
> I've tried these 2 steps on both insecure and secure cluster. Here's a short 
> summary:
> || || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle 2.2 || MR 2.2 + Shuffle 2.6 || 
> MR 2.4 + Shuffle 2.4 || MR 2.4 + Shuffle 2.6 ||
> | Insecure | New Client | OK | OK | Client Incompatible | Client Incomp

[jira] [Created] (YARN-2879) Compatibility validation between YARN 2.2/2.4 and 2.6

2014-11-19 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2879:
-

 Summary: Compatibility validation between YARN 2.2/2.4 and 2.6
 Key: YARN-2879
 URL: https://issues.apache.org/jira/browse/YARN-2879
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen


Recently, I did some simple backward compatibility experiments. Bascially, I've 
taken the following 2 steps:

1. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *new* Hadoop (2.6) client.

2. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *old* Hadoop (2.2/2.4) client that comes with the app.

I've tried these 2 steps on both insecure and secure cluster. Here's a short 
summary:

|| || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle 2.2 || MR 2.2 + Shuffle 2.6 || MR 
2.4 + Shuffle 2.4 || MR 2.4 + Shuffle 2.6 ||
| Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK 
| OK |
| secure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | 
OK |

Note that I've tried to run NM with both old and new shuffle handler version.

In general, the compatibility looks good overall. There're a few issues that 
are related to MR, but they seem to be not the YARN issue. I'll post the 
individual problem in the follow-up comments.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2522) AHSClient may be not necessary

2014-11-18 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2522:
--
Target Version/s: 2.7.0

> AHSClient may be not necessary
> --
>
> Key: YARN-2522
> URL: https://issues.apache.org/jira/browse/YARN-2522
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Per discussion in 
> [YARN-2033|https://issues.apache.org/jira/browse/YARN-2033?focusedCommentId=14126073&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14126073],
>  it may be not necessary to have a separate AHSClient. The methods can be 
> incorporated into TimelineClient. APPLICATION_HISTORY_ENABLED is also useless 
> then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2870) Update examples in document of Timeline Server

2014-11-18 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217074#comment-14217074
 ] 

Zhijie Shen commented on YARN-2870:
---

It's better to completely update the document (YARN-2854). Anyway, the patch is 
ready now, let's commit it. Thanks for the contribution, [~iwasakims]!

> Update examples in document of Timeline Server
> --
>
> Key: YARN-2870
> URL: https://issues.apache.org/jira/browse/YARN-2870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, timelineserver
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Trivial
> Attachments: YARN-2870.1.patch
>
>
> YARN-1982 renamed historyserver to timelineserver but there is still 
> deprecated name in docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2870) Update examples in document of Timeline Server

2014-11-18 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2870:
--
Assignee: Masatake Iwasaki

> Update examples in document of Timeline Server
> --
>
> Key: YARN-2870
> URL: https://issues.apache.org/jira/browse/YARN-2870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, timelineserver
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Trivial
> Attachments: YARN-2870.1.patch
>
>
> YARN-1982 renamed historyserver to timelineserver but there is still 
> deprecated name in docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-18 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217049#comment-14217049
 ] 

Zhijie Shen commented on YARN-2375:
---

bq. Do you mean that we should not check for TIMELINE_SERVICE_ENABLED flag in 
the Application Master and rather have it work same way that it was doing 
before and only check that flag while sending data to timeline server?

I think the logic could be: when TIMELINE_SERVICE_ENABLED == true, read the 
domain env var and construct the timeline client. Only if the timeline client 
is not null, the AM will send the data to timeline server where it should do it.

> Allow enabling/disabling timeline server per framework
> --
>
> Key: YARN-2375
> URL: https://issues.apache.org/jira/browse/YARN-2375
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Mit Desai
> Attachments: YARN-2375.patch, YARN-2375.patch
>
>
> This JIRA is to remove the ats enabled flag check within the 
> TimelineClientImpl. Example where this fails is below.
> While running secure timeline server with ats flag set to disabled on 
> resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-18 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216973#comment-14216973
 ] 

Zhijie Shen commented on YARN-2375:
---

[~mitdesai], thanks for the patch. Two suggestions:

1. We should still let DS work when the timeline service is disable, and we 
just need to prevent sending the timeline data to the timeline server while the 
DS app is running.

2. In JobHistoryEventHandler we need to check both the global config and the mr 
specific config to decide whether we emit MR history events.

> Allow enabling/disabling timeline server per framework
> --
>
> Key: YARN-2375
> URL: https://issues.apache.org/jira/browse/YARN-2375
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Mit Desai
> Attachments: YARN-2375.patch, YARN-2375.patch
>
>
> This JIRA is to remove the ats enabled flag check within the 
> TimelineClientImpl. Example where this fails is below.
> While running secure timeline server with ats flag set to disabled on 
> resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2165) Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero

2014-11-18 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216914#comment-14216914
 ] 

Zhijie Shen commented on YARN-2165:
---

[~vasanthkumar], thanks for your contribution! Some comments about the patch.

1. TIMELINE_SERVICE_CLIENT_MAX_RETRIES can be -1 for endless retry. It's good 
to make it clear in yarn-default.xml too.

2. Instead of {{" property value should be positive and non-zero"}}, can we 
simply say {{" property value should be greater than zero}}?

3. You can use {{com.google.common.base.Preconditions.checkArgument}}.

4. Multiple lines are longer than 80 chars.

5. TIMELINE_SERVICE_LEVELDB_READ_CACHE_SIZE can be zero.

6. TIMELINE_SERVICE_LEVELDB_START_TIME_READ_CACHE_SIZE and 
TIMELINE_SERVICE_LEVELDB_START_TIME_WRITE_CACHE_SIZE seems to be > 0 because 
LRUMap requires this. However, ideally we should be able to disable cache 
completely. Let's deal with it separately.

> Timelineserver should validate that yarn.timeline-service.ttl-ms is greater 
> than zero
> -
>
> Key: YARN-2165
> URL: https://issues.apache.org/jira/browse/YARN-2165
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Karam Singh
>Assignee: Vasanth kumar RJ
> Attachments: YARN-2165.1.patch, YARN-2165.2.patch, YARN-2165.patch
>
>
> Timelineserver should validate that yarn.timeline-service.ttl-ms is greater 
> than zero
> Currently if set yarn.timeline-service.ttl-ms=0 
> Or yarn.timeline-service.ttl-ms=-86400 
> Timeline server start successfully with complaining
> {code}
> 2014-06-15 14:52:16,562 INFO  timeline.LeveldbTimelineStore 
> (LeveldbTimelineStore.java:(247)) - Starting deletion thread with ttl 
> -60480 and cycle interval 30
> {code}
> At starting timelinserver should that yarn.timeline-service-ttl-ms > 0
> otherwise specially for -ive value discard oldvalues timestamp will be set 
> future value. Which may lead to inconsistancy in behavior 
> {code}
> public void run() {
>   while (true) {
> long timestamp = System.currentTimeMillis() - ttl;
> try {
>   discardOldEntities(timestamp);
>   Thread.sleep(ttlInterval);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2838) Issues with TimeLineServer (Application History)

2014-11-17 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2838.
---
Resolution: Not a Problem

Close the ticket and work on separate jiras.

> Issues with TimeLineServer (Application History)
> 
>
> Key: YARN-2838
> URL: https://issues.apache.org/jira/browse/YARN-2838
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0, 2.5.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: IssuesInTimelineServer.pdf
>
>
> Few issues in usage of Timeline server for generic application history access



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2870) Update examples in document of Timeline Server

2014-11-16 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2870:
--
Component/s: timelineserver

> Update examples in document of Timeline Server
> --
>
> Key: YARN-2870
> URL: https://issues.apache.org/jira/browse/YARN-2870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, timelineserver
>Reporter: Masatake Iwasaki
>Priority: Trivial
> Attachments: YARN-2870.1.patch
>
>
> YARN-1982 renamed historyserver to timelineserver but there is still 
> deprecated name in docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2867) TimelineClient DT methods should check if the timeline service is enabled or not

2014-11-14 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2867.
---
Resolution: Invalid

Per discussion on 
[YARN-2375|https://issues.apache.org/jira/browse/YARN-2375?focusedCommentId=14213002&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14213002],
 close this Jira as invalid 

> TimelineClient DT methods should check if the timeline service is enabled or 
> not
> 
>
> Key: YARN-2867
> URL: https://issues.apache.org/jira/browse/YARN-2867
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Zhijie Shen
>
> DT related methods doesn't check if isEnabled == true. On the other side, the 
> internal stuff is only inited when isEnabled == true. NPE happens if users 
> call these methods when the timeline service config is not set to enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-14 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213002#comment-14213002
 ] 

Zhijie Shen commented on YARN-2375:
---

[~jeagles], thanks for the clarification.

bq. I am proposing to retain the flag. However, the responsibility of checking 
whether the ats is enabled needs to be outside of the TimelineClientImpl.

It makes sense to me. If we make this change. YARN-2867 is no longer necessary. 
Will go ahead to close it.

> Allow enabling/disabling timeline server per framework
> --
>
> Key: YARN-2375
> URL: https://issues.apache.org/jira/browse/YARN-2375
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Mit Desai
>
> This JIRA is to remove the ats enabled flag check within the 
> TimelineClientImpl. Example where this fails is below.
> While running secure timeline server with ats flag set to disabled on 
> resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2862) RM might not start if the machine was hard shutdown and FileSystemRMStateStore was used

2014-11-14 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212989#comment-14212989
 ] 

Zhijie Shen commented on YARN-2862:
---

It is likely that the assumption we made in 
[YARN-1776|https://issues.apache.org/jira/browse/YARN-1776?focusedCommentId=13942201&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13942201]
 is not fully correct.

When updating a state file, we (1) write the new file to .new, (2) delete the 
existing one, and (3) rename the .new to the existing file name. If crash 
happens before (2), we use .new to recover the state file when loading the 
state (see FileSystemRMStateStore#checkAndResumeUpdateOperation).

According to the description here, RM can crash when (1) is in progress, and 
leave a corrupted .new file. It seems that we have to do additional validation 
to check if .new file is corrupted or not, or just simply ignore it .

> RM might not start if the machine was hard shutdown and 
> FileSystemRMStateStore was used
> ---
>
> Key: YARN-2862
> URL: https://issues.apache.org/jira/browse/YARN-2862
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ming Ma
>
> This might be a known issue. Given FileSystemRMStateStore isn't used for HA 
> scenario, it might not be that important, unless there is something we need 
> to fix at RM layer to make it more tolerant to RMStore issue.
> When RM was hard shutdown, OS might not get a chance to persist blocks. Some 
> of the stored application data end up with size zero after reboot. And RM 
> didn't like that.
> {noformat}
> ls -al 
> /var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351
> total 156
> drwxr-xr-x.2 x y   4096 Nov 13 16:45 .
> drwxr-xr-x. 1524 x y 151552 Nov 13 16:45 ..
> -rw-r--r--.1 x y  0 Nov 13 16:45 
> appattempt_1412702189634_324351_01
> -rw-r--r--.1 x y  0 Nov 13 16:45 
> .appattempt_1412702189634_324351_01.crc
> -rw-r--r--.1 x y  0 Nov 13 16:45 application_1412702189634_324351
> -rw-r--r--.1 x y  0 Nov 13 16:45 .application_1412702189634_324351.crc
> {noformat}
> When RM starts up
> {noformat}
> 2014-11-13 16:55:25,844 WARN org.apache.hadoop.fs.FSInputChecker: Problem 
> opening checksum file: 
> file:/var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351/application_1412702189634_324351.
>   Ignoring exception:
> java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:197)
> at java.io.DataInputStream.readFully(DataInputStream.java:169)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:792)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.readFile(FileSystemRMStateStore.java:501)
> ...
> 2014-11-13 17:40:48,876 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
> load/recover state
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ApplicationState.getAppId(RMStateStore.java:184)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:306)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:425)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1027)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:484)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:834)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-14 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212932#comment-14212932
 ] 

Zhijie Shen commented on YARN-2375:
---

Filed YARN-2867

> Allow enabling/disabling timeline server per framework
> --
>
> Key: YARN-2375
> URL: https://issues.apache.org/jira/browse/YARN-2375
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Mit Desai
>
> This JIRA is to remove the ats enabled flag check within the 
> TimelineClientImpl. Example where this fails is below.
> While running secure timeline server with ats flag set to disabled on 
> resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2867) TimelineClient DT methods should check if the timeline service is enabled or not

2014-11-14 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2867:
-

 Summary: TimelineClient DT methods should check if the timeline 
service is enabled or not
 Key: YARN-2867
 URL: https://issues.apache.org/jira/browse/YARN-2867
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Zhijie Shen


DT related methods doesn't check if isEnabled == true. On the other side, the 
internal stuff is only inited when isEnabled == true. NPE happens if users call 
these methods when the timeline service config is not set to enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-14 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212928#comment-14212928
 ] 

Zhijie Shen commented on YARN-2375:
---

bq. While running secure timeline server with ats flag set to disabled on 
resource manager, Timeline delegation token renewer throws an NPE.

This is a bug. DT related API methods doesn't check if isEnabled == true. On 
the other side, the internal stuff is only inited when isEnabled == true. This 
is why NPE happens. Will file a separate Jira for it.

As to removing the global flag, I'm not sure if we should do that. Nowadays, we 
still don't assume the timeline server is always up as other components in a 
YARN cluster: RM and NM. Then, if the timeline server is not setup but the YARN 
cluster assumes it is up, it will result in problems. For example, app 
submission fails at getting the timeline DT in a secure cluster.

Therefore, this config should be kept to serve as the flag to indicate if we 
have setup the timeline server for the YARN cluster, until we promote it the be 
the always on daemon like RM and NM. Thoughts?

> Allow enabling/disabling timeline server per framework
> --
>
> Key: YARN-2375
> URL: https://issues.apache.org/jira/browse/YARN-2375
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Mit Desai
>
> This JIRA is to remove the ats enabled flag check within the 
> TimelineClientImpl. Example where this fails is below.
> While running secure timeline server with ats flag set to disabled on 
> resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2166) Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store

2014-11-14 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212600#comment-14212600
 ] 

Zhijie Shen commented on YARN-2166:
---

See the comments on 
[YARN-2165|https://issues.apache.org/jira/browse/YARN-2165?focusedCommentId=14212595&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14212595].
 How about having one pass to do sanity check for all numeric configs.

> Timelineserver should validate that 
> yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than 
> zero when level db is for timeline store
> -
>
> Key: YARN-2166
> URL: https://issues.apache.org/jira/browse/YARN-2166
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Karam Singh
>
> Timelineserver should validate that 
> yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than 
> zero when level db is for timeline store
> other if we start timelineserver with
> yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=-5000
> Timeline starts but Thread.sleep call in EntityDeletionThread.run keep on 
> throwing UncaughtException -ive value
> {code}
> 2014-06-16 10:22:03,537 ERROR yarn.YarnUncaughtExceptionHandler 
> (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread 
> Thread[Thread-4,5,main] threw an Exception.
> java.lang.IllegalArgumentException: timeout value is negative
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore$EntityDeletionThread.run(LeveldbTimelineStore.java:257)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2165) Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero

2014-11-14 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212595#comment-14212595
 ] 

Zhijie Shen commented on YARN-2165:
---

bq. should the check be (<= 0) instead of (< 0) ? Since 0 ttl and ttlinterval 
have no real meanings.

Agree.

To be more general, it's better to do the sanity check for all the numeric 
configurations while initializing the timeline server, making sure a valid 
number has been set. Here's the current list.

{code}
  
Time to live for timeline store data in 
milliseconds.
yarn.timeline-service.ttl-ms
60480
  

  
Length of time to wait between deletion cycles of leveldb 
timeline store in milliseconds.
yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms
30
  

  
Size of read cache for uncompressed blocks for leveldb 
timeline store in bytes.
yarn.timeline-service.leveldb-timeline-store.read-cache-size
104857600
  

  
Size of cache for recently read entity start times for leveldb 
timeline store in number of entities.

yarn.timeline-service.leveldb-timeline-store.start-time-read-cache-size
1
  

  
Size of cache for recently written entity start times for 
leveldb timeline store in number of entities.

yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size
1
  

  
Handler thread count to serve the client RPC 
requests.
yarn.timeline-service.handler-thread-count
10
  

  

Default maximum number of retires for timeline servive client.

yarn.timeline-service.client.max-retries
30
  

  

Default retry time interval for timeline servive client.

yarn.timeline-service.client.retry-interval-ms
1000
  
{code}

> Timelineserver should validate that yarn.timeline-service.ttl-ms is greater 
> than zero
> -
>
> Key: YARN-2165
> URL: https://issues.apache.org/jira/browse/YARN-2165
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Karam Singh
> Attachments: YARN-2165.patch
>
>
> Timelineserver should validate that yarn.timeline-service.ttl-ms is greater 
> than zero
> Currently if set yarn.timeline-service.ttl-ms=0 
> Or yarn.timeline-service.ttl-ms=-86400 
> Timeline server start successfully with complaining
> {code}
> 2014-06-15 14:52:16,562 INFO  timeline.LeveldbTimelineStore 
> (LeveldbTimelineStore.java:(247)) - Starting deletion thread with ttl 
> -60480 and cycle interval 30
> {code}
> At starting timelinserver should that yarn.timeline-service-ttl-ms > 0
> otherwise specially for -ive value discard oldvalues timestamp will be set 
> future value. Which may lead to inconsistancy in behavior 
> {code}
> public void run() {
>   while (true) {
> long timestamp = System.currentTimeMillis() - ttl;
> try {
>   discardOldEntities(timestamp);
>   Thread.sleep(ttlInterval);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2861) Timeline DT secret manager should not reuse the RM's configs.

2014-11-13 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2861:
--
Attachment: YARN-2861.1.patch

Straightforward change: creating separate set of configs for the timeline DT

> Timeline DT secret manager should not reuse the RM's configs.
> -
>
> Key: YARN-2861
> URL: https://issues.apache.org/jira/browse/YARN-2861
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2861.1.patch
>
>
> This is the configs for RM DT secret manager. We should create separate ones 
> for timeline DT only.
> {code}
>   @Override
>   protected void serviceInit(Configuration conf) throws Exception {
> long secretKeyInterval =
> conf.getLong(YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_KEY,
> YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_DEFAULT);
> long tokenMaxLifetime =
> conf.getLong(YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_KEY,
> YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_DEFAULT);
> long tokenRenewInterval =
> conf.getLong(YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_KEY,
> YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_DEFAULT);
> secretManager = new 
> TimelineDelegationTokenSecretManager(secretKeyInterval,
> tokenMaxLifetime, tokenRenewInterval,
> 360);
> secretManager.startThreads();
> serviceAddr = TimelineUtils.getTimelineTokenServiceAddress(getConfig());
> super.init(conf);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2861) Timeline DT secret manager should not reuse the RM's configs.

2014-11-13 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2861:
-

 Summary: Timeline DT secret manager should not reuse the RM's 
configs.
 Key: YARN-2861
 URL: https://issues.apache.org/jira/browse/YARN-2861
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen


This is the configs for RM DT secret manager. We should create separate ones 
for timeline DT only.
{code}
  @Override
  protected void serviceInit(Configuration conf) throws Exception {
long secretKeyInterval =
conf.getLong(YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_KEY,
YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_DEFAULT);
long tokenMaxLifetime =
conf.getLong(YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_KEY,
YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_DEFAULT);
long tokenRenewInterval =
conf.getLong(YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_KEY,
YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_DEFAULT);
secretManager = new TimelineDelegationTokenSecretManager(secretKeyInterval,
tokenMaxLifetime, tokenRenewInterval,
360);
secretManager.startThreads();

serviceAddr = TimelineUtils.getTimelineTokenServiceAddress(getConfig());
super.init(conf);
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2766) ApplicationHistoryManager is expected to return a sorted list of apps/attempts/containers

2014-11-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14211337#comment-14211337
 ] 

Zhijie Shen commented on YARN-2766:
---

+1. Will commit the patch.

>  ApplicationHistoryManager is expected to return a sorted list of 
> apps/attempts/containers
> --
>
> Key: YARN-2766
> URL: https://issues.apache.org/jira/browse/YARN-2766
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch, 
> YARN-2766.patch
>
>
> {{TestApplicationHistoryClientService.testContainers}} and 
> {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
> because the test assertions are assuming a returned Collection is in a 
> certain order.  The collection comes from a HashMap, so the order is not 
> guaranteed, plus, according to [this 
> page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
>  there are situations where the iteration order of a HashMap will be 
> different between Java 7 and 8.
> We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster

2014-11-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14211328#comment-14211328
 ] 

Zhijie Shen commented on YARN-2859:
---

Binding the default port is not right for MiniYARNCluster. Will fix the problem.

> ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
> --
>
> Key: YARN-2859
> URL: https://issues.apache.org/jira/browse/YARN-2859
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Hitesh Shah
>Priority: Critical
>
> In mini cluster, a random port should be used. 
> Also, the config is not updated to the host that the process got bound to.
> {code}
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer 
> address: localhost:10200
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer 
> web address: 0.0.0.0:8188
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster

2014-11-13 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-2859:
-

Assignee: Zhijie Shen

> ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
> --
>
> Key: YARN-2859
> URL: https://issues.apache.org/jira/browse/YARN-2859
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Hitesh Shah
>Assignee: Zhijie Shen
>Priority: Critical
>
> In mini cluster, a random port should be used. 
> Also, the config is not updated to the host that the process got bound to.
> {code}
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer 
> address: localhost:10200
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer 
> web address: 0.0.0.0:8188
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-2838) Issues with TimeLineServer (Application History)

2014-11-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14210199#comment-14210199
 ] 

Zhijie Shen edited comment on YARN-2838 at 11/13/14 7:34 PM:
-

bq. 1. Whatever the CLI command user executes is historyserver or 
timelineserver it looks like ApplicationHistoryServer only run. So can we 
modify the name of the class  ApplicationHistoryServer to TimelineHistoryServer 
(or any other suitable name as it seems like any command user runs 
ApplicationHistoryServer is started)

Yes, not just the the main entry point class, but the whole sub-module needs to 
be refactor somehow to reflect the generalized conception (YARN-2043).

bq. 2. Instead of the "Starting the History Server anyway..." deprecated msg, 
can we have "Starting the Timeline History Server anyway...".

bq. 3. Based on start or stop, deprecated message should get modified to 
"Starting the Timeline History Server anyway..." or "Stopping the Timeline 
History Server anyway..."

See the comment before.

bq. But any way we need to fix this issue also right ? so already any jira is 
raised or shall i work as part of this jira ?

See YARN-2522. We can work this issue there.

bq. And also please inform if this issue needs to be split into mulitple jiras 
(apart from documentation which you have already raised) would like to split 
and work on them.

If you agree, we can close this Jira, and work on separate Jiras that focus on 
each individual issues.

bq. As already i have started looking into these issues, was also planning to 
work on documentation. If you don't mind can you assign the issue (YARN-2854) 
to me ?

No problem, assigned it to you.






was (Author: zjshen):
bq. 1. Whatever the CLI command user executes is historyserver or 
timelineserver it looks like ApplicationHistoryServer only run. So can we 
modify the name of the class  ApplicationHistoryServer to TimelineHistoryServer 
(or any other suitable name as it seems like any command user runs 
ApplicationHistoryServer is started)

Yes, not just the the main entry point class, but the whole sub-module needs to 
be refactor somehow to reflect the generalized conception (YARN-2043).

bq. 2. Instead of the "Starting the History Server anyway..." deprecated msg, 
can we have
"Starting the Timeline History Server anyway...".

bq. 3. Based on start or stop, deprecated message should get modified to 
"Starting the
Timeline History Server anyway..." or "Stopping the Timeline History Server 
anyway..."

See the comment before.

bq. But any way we need to fix this issue also right ? so already any jira is 
raised or shall i work as part of this jira ?

See YARN-2522. We can work this issue there.

bq. And also please inform if this issue needs to be split into mulitple jiras 
(apart from documentation which you have already raised) would like to split 
and work on them.

If you agree, we can close this Jira, and work on separate Jiras that focus on 
each individual issues.

bq. As already i have started looking into these issues, was also planning to 
work on documentation. If you don't mind can you assign the issue (YARN-2854) 
to me ?

No problem, assigned it to you.





> Issues with TimeLineServer (Application History)
> 
>
> Key: YARN-2838
> URL: https://issues.apache.org/jira/browse/YARN-2838
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0, 2.5.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: IssuesInTimelineServer.pdf
>
>
> Few issues in usage of Timeline server for generic application history access



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2838) Issues with TimeLineServer (Application History)

2014-11-13 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2838:
--
Affects Version/s: 2.6.0

> Issues with TimeLineServer (Application History)
> 
>
> Key: YARN-2838
> URL: https://issues.apache.org/jira/browse/YARN-2838
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0, 2.5.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: IssuesInTimelineServer.pdf
>
>
> Few issues in usage of Timeline server for generic application history access



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2838) Issues with TimeLineServer (Application History)

2014-11-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14210199#comment-14210199
 ] 

Zhijie Shen commented on YARN-2838:
---

bq. 1. Whatever the CLI command user executes is historyserver or 
timelineserver it looks like ApplicationHistoryServer only run. So can we 
modify the name of the class  ApplicationHistoryServer to TimelineHistoryServer 
(or any other suitable name as it seems like any command user runs 
ApplicationHistoryServer is started)

Yes, not just the the main entry point class, but the whole sub-module needs to 
be refactor somehow to reflect the generalized conception (YARN-2043).

bq. 2. Instead of the "Starting the History Server anyway..." deprecated msg, 
can we have
"Starting the Timeline History Server anyway...".

bq. 3. Based on start or stop, deprecated message should get modified to 
"Starting the
Timeline History Server anyway..." or "Stopping the Timeline History Server 
anyway..."

See the comment before.

bq. But any way we need to fix this issue also right ? so already any jira is 
raised or shall i work as part of this jira ?

See YARN-2522. We can work this issue there.

bq. And also please inform if this issue needs to be split into mulitple jiras 
(apart from documentation which you have already raised) would like to split 
and work on them.

If you agree, we can close this Jira, and work on separate Jiras that focus on 
each individual issues.

bq. As already i have started looking into these issues, was also planning to 
work on documentation. If you don't mind can you assign the issue (YARN-2854) 
to me ?

No problem, assigned it to you.





> Issues with TimeLineServer (Application History)
> 
>
> Key: YARN-2838
> URL: https://issues.apache.org/jira/browse/YARN-2838
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.5.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: IssuesInTimelineServer.pdf
>
>
> Few issues in usage of Timeline server for generic application history access



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2854) The document about timeline service and generic service needs to be updated

2014-11-13 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2854:
--
Assignee: Naganarasimha G R  (was: Zhijie Shen)

> The document about timeline service and generic service needs to be updated
> ---
>
> Key: YARN-2854
> URL: https://issues.apache.org/jira/browse/YARN-2854
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Naganarasimha G R
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2794) Fix log msgs about distributing system-credentials

2014-11-12 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208455#comment-14208455
 ] 

Zhijie Shen commented on YARN-2794:
---

Please ignore the previous comment. I missed that the existing code already has 
{{if (LOG.isDebugEnabled()) {}}.

+1 Will commit this patch.

> Fix log msgs about distributing system-credentials 
> ---
>
> Key: YARN-2794
> URL: https://issues.apache.org/jira/browse/YARN-2794
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2794.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2794) Fix log msgs about distributing system-credentials

2014-11-12 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208431#comment-14208431
 ] 

Zhijie Shen commented on YARN-2794:
---

Put this code in {{if (LOG.isDebugEnabled()) {}}?
{code}
+  for (Map.Entry entry : map.entrySet()) {
+LOG.debug("Retrieved credentials form RM for " + entry.getKey() + ": "
++ entry.getValue().getAllTokens());
+  }
{code}

> Fix log msgs about distributing system-credentials 
> ---
>
> Key: YARN-2794
> URL: https://issues.apache.org/jira/browse/YARN-2794
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2794.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-2838) Issues with TimeLineServer (Application History)

2014-11-11 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206671#comment-14206671
 ] 

Zhijie Shen edited comment on YARN-2838 at 11/12/14 12:44 AM:
--

[~Naganarasimha], sorry for not responding you immediately as being busy on 
finalizing 2.6. A quick scan through your issue document. Here's my 
clarification:

1. While the entry point of the this sub-module is still called 
ApplicationHistoryServer, it is actually generalized to be TimelineServer right 
now (definitely we need to refactor the code at some point). The baseline 
service provided the the timeline server is to allow the cluster and its apps 
to store their history information, metrics and so on by complying with the 
defined timeline data model. Later on, users and admins can query this 
information to do the analysis.

2. Application history (or we prefer to call it generic history service) is now 
a built-in service in the timeline server to record the generic history 
information of YARN apps. It was on a separate store (on FS), but after 
YARN-2033, it has been moved to the timeline store too, as a payload. We 
replace the old storage layer, but keep the existing interfaces (web UI, 
services, CLI) not changed to be the analog of what RM provides for running 
apps. We still didn't integrate TimelineClient and AHSClient, the latter of 
which is RPC interface of getting generic history information via RPC 
interface. APPLICATION_HISTORY_ENABLED is the only remaining old config to 
control whether we also want to pull the app info from the generic history 
service inside the timeline server. You may want to take a look at YARN-2033 to 
get more context about the change. Moreover, as a number of limitation of the 
old history store, we're no longer going to support it.

3. The document is definitely staled. I'll file separate document Jira, 
however, it's too late for 2.6. Let's target 2.7 for an up-to-date document 
about timeline service and its built-in generic history service (YARN-2854). 
Does it sound good?


was (Author: zjshen):
[~Naganarasimha], sorry for not responding you immediately as being busy on 
finalizing 2.6. A quick scan through your issue document. Here's my 
clarification:

1. While the entry point of the this sub-module is still called 
ApplicationHistoryServer, it is actually generalized to be TimelineServer right 
now (definitely we need to refactor the code at some point). The baseline 
service provided the the timeline server is to allow the cluster and its apps 
to store their history information, metrics and so on by complying with the 
defined timeline data model. Later on, users and admins can query this 
information to do the analysis.

2. Application history (or we prefer to call it generic history service) is now 
a built-in service in the timeline server to record the generic history 
information of YARN apps. It was on a separate store (on FS), but after 
YARN-2033, it has been moved to the timeline store too, as a payload. We 
replace the old storage layer, but keep the existing interfaces (web UI, 
services, CLI) not changed to be the analog of what RM provides for running 
apps. We still didn't integrate TimelineClient and AHSClient, the latter of 
which is RPC interface of getting generic history information via RPC 
interface. APPLICATION_HISTORY_ENABLED is the only remaining old config to 
control whether we also want to pull the app info from the generic history 
service inside the timeline server. You may want to take a look at YARN-2033 to 
get more context about the change. Moreover, as a number of limitation of the 
old history store, we're no longer going to support it.

3. The document is definitely staled. I'll file separate document Jira, 
however, it's too late for 2.6. Let's target 2.7 for an up-to-date document 
about timeline service and its built-in generic history service. Does it sound 
good?

> Issues with TimeLineServer (Application History)
> 
>
> Key: YARN-2838
> URL: https://issues.apache.org/jira/browse/YARN-2838
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.5.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: IssuesInTimelineServer.pdf
>
>
> Few issues in usage of Timeline server for generic application history access



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2854) The document about timeline service and generic service needs to be updated

2014-11-11 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2854:
-

 Summary: The document about timeline service and generic service 
needs to be updated
 Key: YARN-2854
 URL: https://issues.apache.org/jira/browse/YARN-2854
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2838) Issues with TimeLineServer (Application History)

2014-11-11 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206671#comment-14206671
 ] 

Zhijie Shen commented on YARN-2838:
---

[~Naganarasimha], sorry for not responding you immediately as being busy on 
finalizing 2.6. A quick scan through your issue document. Here's my 
clarification:

1. While the entry point of the this sub-module is still called 
ApplicationHistoryServer, it is actually generalized to be TimelineServer right 
now (definitely we need to refactor the code at some point). The baseline 
service provided the the timeline server is to allow the cluster and its apps 
to store their history information, metrics and so on by complying with the 
defined timeline data model. Later on, users and admins can query this 
information to do the analysis.

2. Application history (or we prefer to call it generic history service) is now 
a built-in service in the timeline server to record the generic history 
information of YARN apps. It was on a separate store (on FS), but after 
YARN-2033, it has been moved to the timeline store too, as a payload. We 
replace the old storage layer, but keep the existing interfaces (web UI, 
services, CLI) not changed to be the analog of what RM provides for running 
apps. We still didn't integrate TimelineClient and AHSClient, the latter of 
which is RPC interface of getting generic history information via RPC 
interface. APPLICATION_HISTORY_ENABLED is the only remaining old config to 
control whether we also want to pull the app info from the generic history 
service inside the timeline server. You may want to take a look at YARN-2033 to 
get more context about the change. Moreover, as a number of limitation of the 
old history store, we're no longer going to support it.

3. The document is definitely staled. I'll file separate document Jira, 
however, it's too late for 2.6. Let's target 2.7 for an up-to-date document 
about timeline service and its built-in generic history service. Does it sound 
good?

> Issues with TimeLineServer (Application History)
> 
>
> Key: YARN-2838
> URL: https://issues.apache.org/jira/browse/YARN-2838
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.5.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: IssuesInTimelineServer.pdf
>
>
> Few issues in usage of Timeline server for generic application history access



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2837) Timeline server needs to recover the timeline DT when restarting

2014-11-10 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14205768#comment-14205768
 ] 

Zhijie Shen commented on YARN-2837:
---

Test the patch on a single node secure cluster:

1. Start and restart the timeline server, and the DT information is recovered 
properly.
2. The DT generated in before the timeline server can be renewed properly 
afterwards.

Some other issues I've observed while doing test:

At the very early seconds the http server is started, the MR job, which tries 
to emit the timeline data, gets a number of 404 error. I guess the server is 
not fully ready before it taking the incoming requests.

> Timeline server needs to recover the timeline DT when restarting
> 
>
> Key: YARN-2837
> URL: https://issues.apache.org/jira/browse/YARN-2837
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Blocker
> Attachments: YARN-2837.1.patch
>
>
> Timeline server needs to recover the stateful information when restarting as 
> RM/NM/JHS does now. So far the stateful information only includes the 
> timeline DT. Without recovery, the timeline DT of the existing YARN apps is 
> not long valid, and cannot be renewed any more after the timeline server is 
> restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2837) Timeline server needs to recover the timeline DT when restarting

2014-11-10 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2837:
--
Attachment: (was: YARN-2834.1.patch)

> Timeline server needs to recover the timeline DT when restarting
> 
>
> Key: YARN-2837
> URL: https://issues.apache.org/jira/browse/YARN-2837
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Blocker
> Attachments: YARN-2837.1.patch
>
>
> Timeline server needs to recover the stateful information when restarting as 
> RM/NM/JHS does now. So far the stateful information only includes the 
> timeline DT. Without recovery, the timeline DT of the existing YARN apps is 
> not long valid, and cannot be renewed any more after the timeline server is 
> restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2837) Timeline server needs to recover the timeline DT when restarting

2014-11-10 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2837:
--
Attachment: YARN-2837.1.patch

> Timeline server needs to recover the timeline DT when restarting
> 
>
> Key: YARN-2837
> URL: https://issues.apache.org/jira/browse/YARN-2837
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Blocker
> Attachments: YARN-2837.1.patch
>
>
> Timeline server needs to recover the stateful information when restarting as 
> RM/NM/JHS does now. So far the stateful information only includes the 
> timeline DT. Without recovery, the timeline DT of the existing YARN apps is 
> not long valid, and cannot be renewed any more after the timeline server is 
> restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2837) Timeline server needs to recover the timeline DT when restarting

2014-11-10 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2837:
--
Attachment: YARN-2834.1.patch

Create a patch to make the timeline state store, I choose to use Leveldb impl 
because:

1. Timeline server already uses leveldb.
2. It provides atomic operations, and isolate the system dependent FS.
3. Less heavy and complex than using HDFS (in particular in secure mode)
4. Easy to implement the operations. 

> Timeline server needs to recover the timeline DT when restarting
> 
>
> Key: YARN-2837
> URL: https://issues.apache.org/jira/browse/YARN-2837
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Blocker
> Attachments: YARN-2834.1.patch
>
>
> Timeline server needs to recover the stateful information when restarting as 
> RM/NM/JHS does now. So far the stateful information only includes the 
> timeline DT. Without recovery, the timeline DT of the existing YARN apps is 
> not long valid, and cannot be renewed any more after the timeline server is 
> restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2837) Timeline server needs to recover the timeline DT when restarting

2014-11-09 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2837:
-

 Summary: Timeline server needs to recover the timeline DT when 
restarting
 Key: YARN-2837
 URL: https://issues.apache.org/jira/browse/YARN-2837
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker


Timeline server needs to recover the stateful information when restarting as 
RM/NM/JHS does now. So far the stateful information only includes the timeline 
DT. Without recovery, the timeline DT of the existing YARN apps is not long 
valid, and cannot be renewed any more after the timeline server is restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2834) Resource manager crashed with Null Pointer Exception

2014-11-09 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14204132#comment-14204132
 ] 

Zhijie Shen commented on YARN-2834:
---

bq. Anyways, treating renewal failures is broken today. I am okay ignoring 
renewal failures during recovery in this ticket. But let's file a blocker for 
handling them correctly in 2.7.

Thanks for your comments. +1 for this proposal.

> Resource manager crashed with Null Pointer Exception
> 
>
> Key: YARN-2834
> URL: https://issues.apache.org/jira/browse/YARN-2834
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-2834.1.patch
>
>
> Resource manager failed after restart. 
> {noformat}
> 2014-11-09 04:12:53,013 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initializeQueues(467)) - Initialized root queue root: 
> numChildQueue= 2, capacity=1.0, absoluteCapacity=1.0, 
> usedResources=usedCapacity=0.0, numApps=0, numContainers=0
> 2014-11-09 04:12:53,013 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initializeQueueMappings(436)) - Initialized queue 
> mappings, override: false
> 2014-11-09 04:12:53,013 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initScheduler(305)) - Initialized CapacityScheduler 
> with calculator=class 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
> minimumAllocation=<>, maximumAllocation=< vCores:32>>, asynchronousScheduling=false, asyncScheduleInterval=5ms
> 2014-11-09 04:12:53,015 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(272)) - Service ResourceManager failed in 
> state STARTED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1041)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1005)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:821)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:843)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:826)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:701)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590)
> at 
> org.apache.hadoop.service.Abstra

[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-11-07 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203215#comment-14203215
 ] 

Zhijie Shen commented on YARN-2505:
---

Committed to trunk, branch-2 and branch-2.6. Thanks Craig for the patch, and 
Wangda and Xuan for the review.

> Support get/add/remove/change labels in RM REST API
> ---
>
> Key: YARN-2505
> URL: https://issues.apache.org/jira/browse/YARN-2505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Craig Welch
> Fix For: 2.6.0
>
> Attachments: YARN-2505.1.patch, YARN-2505.11.patch, 
> YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, 
> YARN-2505.15.patch, YARN-2505.16.patch, YARN-2505.16.patch, 
> YARN-2505.16.patch, YARN-2505.18.patch, YARN-2505.19.patch, 
> YARN-2505.20.patch, YARN-2505.21.patch, YARN-2505.21.patch, 
> YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, 
> YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, YARN-2505.9.patch, 
> YARN-2505.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-11-07 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203195#comment-14203195
 ] 

Zhijie Shen commented on YARN-2505:
---

Kick the Jenkins again

> Support get/add/remove/change labels in RM REST API
> ---
>
> Key: YARN-2505
> URL: https://issues.apache.org/jira/browse/YARN-2505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Craig Welch
> Attachments: YARN-2505.1.patch, YARN-2505.11.patch, 
> YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, 
> YARN-2505.15.patch, YARN-2505.16.patch, YARN-2505.16.patch, 
> YARN-2505.16.patch, YARN-2505.18.patch, YARN-2505.19.patch, 
> YARN-2505.20.patch, YARN-2505.21.patch, YARN-2505.21.patch, 
> YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, 
> YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, YARN-2505.9.patch, 
> YARN-2505.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-11-07 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203116#comment-14203116
 ] 

Zhijie Shen commented on YARN-2505:
---

+1 for the latest patch. I'll commit it once jenkins +1 too

> Support get/add/remove/change labels in RM REST API
> ---
>
> Key: YARN-2505
> URL: https://issues.apache.org/jira/browse/YARN-2505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Craig Welch
> Attachments: YARN-2505.1.patch, YARN-2505.11.patch, 
> YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, 
> YARN-2505.15.patch, YARN-2505.16.patch, YARN-2505.16.patch, 
> YARN-2505.16.patch, YARN-2505.18.patch, YARN-2505.19.patch, 
> YARN-2505.20.patch, YARN-2505.21.patch, YARN-2505.3.patch, YARN-2505.4.patch, 
> YARN-2505.5.patch, YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, 
> YARN-2505.9.patch, YARN-2505.9.patch, YARN-2505.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-11-07 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202992#comment-14202992
 ] 

Zhijie Shen commented on YARN-2505:
---

I just have one concern about ConverterUtils#toNodeId(). The behavior is 
changed when the arg nodeId string is invalid. It may affect NodeCLI and 
AggregatedLogsBlock when people puts an invalid nodeId string, or the webapp 
generates an url with invalid nodeId string.

BTW, while ConverterUtils is marked \@Private, it's in yarn-common. I'm not 
sure if other components have already make use of this actually useful "APIs".

Any thoughts?

> Support get/add/remove/change labels in RM REST API
> ---
>
> Key: YARN-2505
> URL: https://issues.apache.org/jira/browse/YARN-2505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Craig Welch
> Attachments: YARN-2505.1.patch, YARN-2505.11.patch, 
> YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, 
> YARN-2505.15.patch, YARN-2505.16.patch, YARN-2505.16.patch, 
> YARN-2505.16.patch, YARN-2505.18.patch, YARN-2505.19.patch, 
> YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, 
> YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, YARN-2505.9.patch, 
> YARN-2505.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2819) NPE in ATS Timeline Domains when upgrading from 2.4 to 2.6

2014-11-06 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201602#comment-14201602
 ] 

Zhijie Shen commented on YARN-2819:
---

I've done the following experiments locally:

1. Run timeline server 2.5 to generate the old timeline data without domainId 
field in leveldb.

2. Run timeline server 2.6 (current trunk actually), and try to update the old 
entity and relate to it. I can reproduce the same NPE as is mentioned in the 
description.

3. Run timeline server 2.6 with the attached patch, and try to update the old 
entity and relate to it. The problem is gone.

> NPE in ATS Timeline Domains when upgrading from 2.4 to 2.6
> --
>
> Key: YARN-2819
> URL: https://issues.apache.org/jira/browse/YARN-2819
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Gopal V
>Assignee: Zhijie Shen
>Priority: Critical
>  Labels: Upgrade
> Attachments: YARN-2819.1.patch
>
>
> {code}
> Caused by: java.lang.NullPointerException
> at java.lang.String.(String.java:554)
> at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.put(LeveldbTimelineStore.java:873)
> at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.put(LeveldbTimelineStore.java:1014)
> at 
> org.apache.hadoop.yarn.server.timeline.TimelineDataManager.postEntities(TimelineDataManager.java:330)
> at 
> org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.postEntities(TimelineWebServices.java:260)
> {code}
> triggered by 
> {code}
> entity.getRelatedEntities();
> ...
> } else {
>   byte[] domainIdBytes = db.get(createDomainIdKey(
>   relatedEntityId, relatedEntityType, 
> relatedEntityStartTime));
>   // This is the existing entity
>   String domainId = new String(domainIdBytes);
>   if (!domainId.equals(entity.getDomainId())) {
> {code}
> The new String(domainIdBytes); throws an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2819) NPE in ATS Timeline Domains when upgrading from 2.4 to 2.6

2014-11-06 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2819:
--
Attachment: YARN-2819.1.patch

Create a patch to make the leveldb store be compatible to existing data. 
Basically, we're going to treat the entity without domain Id as the one having 
a DEFAULT domain.

> NPE in ATS Timeline Domains when upgrading from 2.4 to 2.6
> --
>
> Key: YARN-2819
> URL: https://issues.apache.org/jira/browse/YARN-2819
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Gopal V
>Assignee: Zhijie Shen
>Priority: Critical
>  Labels: Upgrade
> Attachments: YARN-2819.1.patch
>
>
> {code}
> Caused by: java.lang.NullPointerException
> at java.lang.String.(String.java:554)
> at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.put(LeveldbTimelineStore.java:873)
> at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.put(LeveldbTimelineStore.java:1014)
> at 
> org.apache.hadoop.yarn.server.timeline.TimelineDataManager.postEntities(TimelineDataManager.java:330)
> at 
> org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.postEntities(TimelineWebServices.java:260)
> {code}
> triggered by 
> {code}
> entity.getRelatedEntities();
> ...
> } else {
>   byte[] domainIdBytes = db.get(createDomainIdKey(
>   relatedEntityId, relatedEntityType, 
> relatedEntityStartTime));
>   // This is the existing entity
>   String domainId = new String(domainIdBytes);
>   if (!domainId.equals(entity.getDomainId())) {
> {code}
> The new String(domainIdBytes); throws an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2819) NPE in ATS Timeline Domains when upgrading from 2.4 to 2.6

2014-11-06 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14200687#comment-14200687
 ] 

Zhijie Shen commented on YARN-2819:
---

The NPE happens because the data integrity assumes that no entity has a null 
domainId. However, if leveldb already contains the timeline data that are 
generated by prior 2.6 timeline server, the integrity is broken. Previously, 
the entity doesn't have the domain information. Will work on a fix to be 
compatible to the existing store.

> NPE in ATS Timeline Domains when upgrading from 2.4 to 2.6
> --
>
> Key: YARN-2819
> URL: https://issues.apache.org/jira/browse/YARN-2819
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Gopal V
>Assignee: Zhijie Shen
>  Labels: Upgrade
>
> {code}
> Caused by: java.lang.NullPointerException
> at java.lang.String.(String.java:554)
> at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.put(LeveldbTimelineStore.java:873)
> at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.put(LeveldbTimelineStore.java:1014)
> at 
> org.apache.hadoop.yarn.server.timeline.TimelineDataManager.postEntities(TimelineDataManager.java:330)
> at 
> org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.postEntities(TimelineWebServices.java:260)
> {code}
> triggered by 
> {code}
> entity.getRelatedEntities();
> ...
> } else {
>   byte[] domainIdBytes = db.get(createDomainIdKey(
>   relatedEntityId, relatedEntityType, 
> relatedEntityStartTime));
>   // This is the existing entity
>   String domainId = new String(domainIdBytes);
>   if (!domainId.equals(entity.getDomainId())) {
> {code}
> The new String(domainIdBytes); throws an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2819) NPE in ATS Timeline Domains when upgrading from 2.4 to 2.6

2014-11-06 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2819:
--
Priority: Critical  (was: Major)

> NPE in ATS Timeline Domains when upgrading from 2.4 to 2.6
> --
>
> Key: YARN-2819
> URL: https://issues.apache.org/jira/browse/YARN-2819
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Gopal V
>Assignee: Zhijie Shen
>Priority: Critical
>  Labels: Upgrade
>
> {code}
> Caused by: java.lang.NullPointerException
> at java.lang.String.(String.java:554)
> at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.put(LeveldbTimelineStore.java:873)
> at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.put(LeveldbTimelineStore.java:1014)
> at 
> org.apache.hadoop.yarn.server.timeline.TimelineDataManager.postEntities(TimelineDataManager.java:330)
> at 
> org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.postEntities(TimelineWebServices.java:260)
> {code}
> triggered by 
> {code}
> entity.getRelatedEntities();
> ...
> } else {
>   byte[] domainIdBytes = db.get(createDomainIdKey(
>   relatedEntityId, relatedEntityType, 
> relatedEntityStartTime));
>   // This is the existing entity
>   String domainId = new String(domainIdBytes);
>   if (!domainId.equals(entity.getDomainId())) {
> {code}
> The new String(domainIdBytes); throws an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2819) NPE in ATS Timeline Domains when upgrading from 2.4 to 2.6

2014-11-06 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-2819:
-

Assignee: Zhijie Shen

> NPE in ATS Timeline Domains when upgrading from 2.4 to 2.6
> --
>
> Key: YARN-2819
> URL: https://issues.apache.org/jira/browse/YARN-2819
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Gopal V
>Assignee: Zhijie Shen
>  Labels: Upgrade
>
> {code}
> Caused by: java.lang.NullPointerException
> at java.lang.String.(String.java:554)
> at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.put(LeveldbTimelineStore.java:873)
> at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.put(LeveldbTimelineStore.java:1014)
> at 
> org.apache.hadoop.yarn.server.timeline.TimelineDataManager.postEntities(TimelineDataManager.java:330)
> at 
> org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.postEntities(TimelineWebServices.java:260)
> {code}
> triggered by 
> {code}
> entity.getRelatedEntities();
> ...
> } else {
>   byte[] domainIdBytes = db.get(createDomainIdKey(
>   relatedEntityId, relatedEntityType, 
> relatedEntityStartTime));
>   // This is the existing entity
>   String domainId = new String(domainIdBytes);
>   if (!domainId.equals(entity.getDomainId())) {
> {code}
> The new String(domainIdBytes); throws an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2818) Remove the logic to inject entity owner as the primary filter

2014-11-06 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2818:
--
Attachment: YARN-2818.2.patch

Remove one more unnecessary method.

> Remove the logic to inject entity owner as the primary filter
> -
>
> Key: YARN-2818
> URL: https://issues.apache.org/jira/browse/YARN-2818
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Critical
> Attachments: YARN-2818.1.patch, YARN-2818.2.patch
>
>
> In 2.5, we inject owner info as a primary filter to support entity-level 
> acls. Since 2.6, we have a different acls solution (YARN-2102). Therefore, 
> there's no need to inject owner info. There're two motivations:
> 1. For leveldb timeline store, the primary filter is expensive. When we have 
> a primary filter, we need to make a complete copy of the entity on the logic 
> index table.
> 2. Owner info is incomplete. Say we want to put E1 (owner = "tester", 
> relatedEntity = "E2"). If E2 doesn't exist before, leveldb timeline store 
> will create an empty E2 without owner info (at the db point of view, it 
> doesn't know owner is a "special" primary filter). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2818) Remove the logic to inject entity owner as the primary filter

2014-11-05 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2818:
--
Attachment: YARN-2818.1.patch

Put a patch to remove this logic. The change should be mostly compatible. 2.6 
server can still read the data created by 2.5, but take the owner as the normal 
primary filter. 2.5 server can also read the 2.6 data. The only drawback is 
that no owner info is available for entity-level acl control. However, as I've 
mentioned in description, the owner info will be incomplete. So anyway, there's 
a bug.

> Remove the logic to inject entity owner as the primary filter
> -
>
> Key: YARN-2818
> URL: https://issues.apache.org/jira/browse/YARN-2818
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Critical
> Attachments: YARN-2818.1.patch
>
>
> In 2.5, we inject owner info as a primary filter to support entity-level 
> acls. Since 2.6, we have a different acls solution (YARN-2102). Therefore, 
> there's no need to inject owner info. There're two motivations:
> 1. For leveldb timeline store, the primary filter is expensive. When we have 
> a primary filter, we need to make a complete copy of the entity on the logic 
> index table.
> 2. Owner info is incomplete. Say we want to put E1 (owner = "tester", 
> relatedEntity = "E2"). If E2 doesn't exist before, leveldb timeline store 
> will create an empty E2 without owner info (at the db point of view, it 
> doesn't know owner is a "special" primary filter). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2818) Remove the logic to inject entity owner as the primary filter

2014-11-05 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2818:
-

 Summary: Remove the logic to inject entity owner as the primary 
filter
 Key: YARN-2818
 URL: https://issues.apache.org/jira/browse/YARN-2818
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical


In 2.5, we inject owner info as a primary filter to support entity-level acls. 
Since 2.6, we have a different acls solution (YARN-2102). Therefore, there's no 
need to inject owner info. There're two motivations:

1. For leveldb timeline store, the primary filter is expensive. When we have a 
primary filter, we need to make a complete copy of the entity on the logic 
index table.

2. Owner info is incomplete. Say we want to put E1 (owner = "tester", 
relatedEntity = "E2"). If E2 doesn't exist before, leveldb timeline store will 
create an empty E2 without owner info (at the db point of view, it doesn't know 
owner is a "special" primary filter). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2813) NPE from MemoryTimelineStore.getDomains

2014-11-05 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2813:
--
Attachment: YARN-2813.1.patch

Upload a patch to fix the npe

> NPE from MemoryTimelineStore.getDomains
> ---
>
> Key: YARN-2813
> URL: https://issues.apache.org/jira/browse/YARN-2813
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2813.1.patch
>
>
> {code}
> 2014-11-04 20:50:05,146 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> javax.ws.rs.WebApplicationException: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getDomains(TimelineWebServices.java:356)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
> at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
> at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
> at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
> at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
> at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
> at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
> at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
> at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
> at 
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
> at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
> at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
> at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
> at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
> at 
> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
> at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
> at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886)
> at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
> at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
> at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
> at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
> at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
> at 
> com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:96)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:572)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:269)
> at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:542)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1204)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>

[jira] [Created] (YARN-2813) NPE from MemoryTimelineStore.getDomains

2014-11-05 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2813:
-

 Summary: NPE from MemoryTimelineStore.getDomains
 Key: YARN-2813
 URL: https://issues.apache.org/jira/browse/YARN-2813
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


{code}
2014-11-04 20:50:05,146 WARN 
org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
javax.ws.rs.WebApplicationException: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getDomains(TimelineWebServices.java:356)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at 
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
at 
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at 
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at 
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
at 
com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:96)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:572)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:269)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:542)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1204)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle

[jira] [Updated] (YARN-2812) TestApplicationHistoryServer is likely to fail on less powerful machine

2014-11-05 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2812:
--
Attachment: YARN-2812.1.patch

Upload a patch to fix the issue.

> TestApplicationHistoryServer is likely to fail on less powerful machine
> ---
>
> Key: YARN-2812
> URL: https://issues.apache.org/jira/browse/YARN-2812
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2812.1.patch
>
>
> {code:title=testFilteOverrides}
> java.lang.Exception: test timed out after 5 milliseconds
>   at java.net.Inet4AddressImpl.getHostByAddr(Native Method)
>   at java.net.InetAddress$1.getHostByAddr(InetAddress.java:898)
>   at java.net.InetAddress.getHostFromNameService(InetAddress.java:583)
>   at java.net.InetAddress.getHostName(InetAddress.java:525)
>   at java.net.InetAddress.getHostName(InetAddress.java:497)
>   at 
> java.net.InetSocketAddress$InetSocketAddressHolder.getHostName(InetSocketAddress.java:82)
>   at 
> java.net.InetSocketAddress$InetSocketAddressHolder.access$600(InetSocketAddress.java:56)
>   at java.net.InetSocketAddress.getHostName(InetSocketAddress.java:345)
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:169)
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:132)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
>   at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.serviceStart(ApplicationHistoryClientService.java:87)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceStart(ApplicationHistoryServer.java:111)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryServer.testFilteOverrides(TestApplicationHistoryServer.java:104)
> {code}
> {code:title=testStartStopServer, testLaunch}
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock 
> /grid/0/jenkins/workspace/UT-hadoop-champlain-chunks/workspace/UT-hadoop-champlain-chunks/commonarea/hdp-BUILDS/hadoop-2.6.0.2.2.0.0-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/build/test/yarn/timeline/leveldb-timeline-store.ldb/LOCK:
>  already held by process
>   at 
> org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
>   at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
>   at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
>   at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:219)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:99)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryServer.testStartStopServer(TestApplicationHistoryServer.java:48)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2812) TestApplicationHistoryServer is likely to fail on less powerful machine

2014-11-05 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199075#comment-14199075
 ] 

Zhijie Shen commented on YARN-2812:
---

The root causes of the test failures are:

1. testFilteOverrides actually starts and stops 4 times, and the timeout 
allowance given to it is similar to the other cases that just does it once. It 
seems to be to short for a slow machine.

2. While testFilteOverrides is timeout, the lock of the dir of leveldb is still 
not released, while the other two cases that want to access the same dir (by 
default) encounter the lock exception.

Will fix the test failures.

> TestApplicationHistoryServer is likely to fail on less powerful machine
> ---
>
> Key: YARN-2812
> URL: https://issues.apache.org/jira/browse/YARN-2812
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> {code:title=testFilteOverrides}
> java.lang.Exception: test timed out after 5 milliseconds
>   at java.net.Inet4AddressImpl.getHostByAddr(Native Method)
>   at java.net.InetAddress$1.getHostByAddr(InetAddress.java:898)
>   at java.net.InetAddress.getHostFromNameService(InetAddress.java:583)
>   at java.net.InetAddress.getHostName(InetAddress.java:525)
>   at java.net.InetAddress.getHostName(InetAddress.java:497)
>   at 
> java.net.InetSocketAddress$InetSocketAddressHolder.getHostName(InetSocketAddress.java:82)
>   at 
> java.net.InetSocketAddress$InetSocketAddressHolder.access$600(InetSocketAddress.java:56)
>   at java.net.InetSocketAddress.getHostName(InetSocketAddress.java:345)
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:169)
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:132)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
>   at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.serviceStart(ApplicationHistoryClientService.java:87)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceStart(ApplicationHistoryServer.java:111)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryServer.testFilteOverrides(TestApplicationHistoryServer.java:104)
> {code}
> {code:title=testStartStopServer, testLaunch}
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock 
> /grid/0/jenkins/workspace/UT-hadoop-champlain-chunks/workspace/UT-hadoop-champlain-chunks/commonarea/hdp-BUILDS/hadoop-2.6.0.2.2.0.0-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/build/test/yarn/timeline/leveldb-timeline-store.ldb/LOCK:
>  already held by process
>   at 
> org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
>   at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
>   at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
>   at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:219)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:99)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryServer.testStartStopServer(TestApplicationHistoryServer.java:48)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2812) TestApplicationHistoryServer is likely to fail on less powerful machine

2014-11-05 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2812:
-

 Summary: TestApplicationHistoryServer is likely to fail on less 
powerful machine
 Key: YARN-2812
 URL: https://issues.apache.org/jira/browse/YARN-2812
 Project: Hadoop YARN
  Issue Type: Test
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


{code:title=testFilteOverrides}
java.lang.Exception: test timed out after 5 milliseconds
at java.net.Inet4AddressImpl.getHostByAddr(Native Method)
at java.net.InetAddress$1.getHostByAddr(InetAddress.java:898)
at java.net.InetAddress.getHostFromNameService(InetAddress.java:583)
at java.net.InetAddress.getHostName(InetAddress.java:525)
at java.net.InetAddress.getHostName(InetAddress.java:497)
at 
java.net.InetSocketAddress$InetSocketAddressHolder.getHostName(InetSocketAddress.java:82)
at 
java.net.InetSocketAddress$InetSocketAddressHolder.access$600(InetSocketAddress.java:56)
at java.net.InetSocketAddress.getHostName(InetSocketAddress.java:345)
at 
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:169)
at 
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:132)
at 
org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.serviceStart(ApplicationHistoryClientService.java:87)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceStart(ApplicationHistoryServer.java:111)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryServer.testFilteOverrides(TestApplicationHistoryServer.java:104)
{code}

{code:title=testStartStopServer, testLaunch}
org.apache.hadoop.service.ServiceStateException: 
org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock 
/grid/0/jenkins/workspace/UT-hadoop-champlain-chunks/workspace/UT-hadoop-champlain-chunks/commonarea/hdp-BUILDS/hadoop-2.6.0.2.2.0.0-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/build/test/yarn/timeline/leveldb-timeline-store.ldb/LOCK:
 already held by process
at 
org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
at 
org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:219)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:99)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryServer.testStartStopServer(TestApplicationHistoryServer.java:48)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2767) RM web services - add test case to ensure the http static user cannot kill or submit apps in secure mode

2014-11-05 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198846#comment-14198846
 ] 

Zhijie Shen commented on YARN-2767:
---

+1 will commit the patch.

> RM web services - add test case to ensure the http static user cannot kill or 
> submit apps in secure mode
> 
>
> Key: YARN-2767
> URL: https://issues.apache.org/jira/browse/YARN-2767
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2767.0.patch, apache-yarn-2767.1.patch, 
> apache-yarn-2767.2.patch, apache-yarn-2767.3.patch
>
>
> We should add a test to ensure that the http static user used to access the 
> RM web interface can't submit or kill apps if the cluster is running in 
> secure mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2506) TimelineClient should NOT be in yarn-common project

2014-11-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197442#comment-14197442
 ] 

Zhijie Shen commented on YARN-2506:
---

Sure, I can take care of it.

> TimelineClient should NOT be in yarn-common project
> ---
>
> Key: YARN-2506
> URL: https://issues.apache.org/jira/browse/YARN-2506
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
>Priority: Critical
>
> YARN-2298 incorrectly moved TimelineClient to yarn-common project. It doesn't 
> belong there, we should move it back to yarn-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2752) ContainerExecutor always append "nice -n" in command on branch-2

2014-11-04 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2752:
--
Summary: ContainerExecutor always append "nice -n" in command on branch-2  
(was: TestContainerExecutor.testRunCommandNoPriority fails in branch-2)

> ContainerExecutor always append "nice -n" in command on branch-2
> 
>
> Key: YARN-2752
> URL: https://issues.apache.org/jira/browse/YARN-2752
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Critical
> Fix For: 2.6.0
>
> Attachments: YARN-2752.1-branch-2.patch, YARN-2752.2-branch-2.patch
>
>
> TestContainerExecutor.testRunCommandNoPriority fails in branch-2. But it 
> passed in trunk. 
> The function code ContainerExecutor.getRunCommand() in trunk is different 
> from that in branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2752) TestContainerExecutor.testRunCommandNoPriority fails in branch-2

2014-11-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197143#comment-14197143
 ] 

Zhijie Shen commented on YARN-2752:
---

+1. The jenkins doesn't work because the patch only applies to branch-2. I've 
verified it locally. It can compile and fix the test failure. Will commit the 
patch.

> TestContainerExecutor.testRunCommandNoPriority fails in branch-2
> 
>
> Key: YARN-2752
> URL: https://issues.apache.org/jira/browse/YARN-2752
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-2752.1-branch-2.patch, YARN-2752.2-branch-2.patch
>
>
> TestContainerExecutor.testRunCommandNoPriority fails in branch-2. But it 
> passed in trunk. 
> The function code ContainerExecutor.getRunCommand() in trunk is different 
> from that in branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2804) Timeline server .out log have JAXB binding exceptions and warnings.

2014-11-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197087#comment-14197087
 ] 

Zhijie Shen commented on YARN-2804:
---

In case folks want to know the .out output afterwards, I posted it here:

{code}
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
file size   (blocks, -f) unlimited
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 256
pipe size(512 bytes, -p) 1
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 709
virtual memory  (kbytes, -v) unlimited
Nov 04, 2014 2:32:55 PM 
com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider
 get
WARNING: You are attempting to use a deprecated API (specifically, attempting 
to @Inject ServletContext inside an eagerly created singleton. While we allow 
this for backwards compatibility, be warned that this MAY have unexpected 
behavior if you have more than one injector (with ServletModule) running in the 
same JVM. Please consult the Guice documentation at 
http://code.google.com/p/google-guice/wiki/Servlets for more information.
Nov 04, 2014 2:32:55 PM 
com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider
 get
WARNING: You are attempting to use a deprecated API (specifically, attempting 
to @Inject ServletContext inside an eagerly created singleton. While we allow 
this for backwards compatibility, be warned that this MAY have unexpected 
behavior if you have more than one injector (with ServletModule) running in the 
same JVM. Please consult the Guice documentation at 
http://code.google.com/p/google-guice/wiki/Servlets for more information.
Nov 04, 2014 2:32:55 PM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider as 
a provider class
Nov 04, 2014 2:32:55 PM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering 
org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSWebServices 
as a root resource class
Nov 04, 2014 2:32:55 PM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering 
org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices as a root 
resource class
Nov 04, 2014 2:32:55 PM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a 
provider class
Nov 04, 2014 2:32:55 PM 
com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
Nov 04, 2014 2:32:56 PM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory 
getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to 
GuiceManagedComponentProvider with the scope "Singleton"
Nov 04, 2014 2:32:56 PM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory 
getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider to 
GuiceManagedComponentProvider with the scope "Singleton"
Nov 04, 2014 2:32:56 PM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory 
getComponentProvider
INFO: Binding 
org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSWebServices 
to GuiceManagedComponentProvider with the scope "Singleton"
Nov 04, 2014 2:32:56 PM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory 
getComponentProvider
INFO: Binding org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices 
to GuiceManagedComponentProvider with the scope "Singleton"
{code}

It WON'T increase with the number of RESTful requests.

> Timeline server .out log have JAXB binding exceptions and warnings.
> ---
>
> Key: YARN-2804
> URL: https://issues.apache.org/jira/browse/YARN-2804
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Critical
> Attachments: YARN-2804.1.patch, YARN-2804.2.patch
>
>
> Unlike other daemon, timeline server binds JacksonJaxbJsonProvider to resolve 
> the resources. However, there are noises in .out log:
> {code}
> SEVERE: Failed to generate the schema for the JAX-B elements
> com.sun.xml.bind.v2.runtime.IllegalAnnotationsException: 2 counts of 
> IllegalAnnotationExceptions
> java.util.Map is an interface, and JAXB can't handle interfaces.
>   this problem is related to the following location:
>   at java.util.Map
>   at public java.util.Map 
> org.apache.hadoop

[jira] [Updated] (YARN-2804) Timeline server .out log have JAXB binding exceptions and warnings.

2014-11-04 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2804:
--
Attachment: YARN-2804.2.patch

Thanks for the comments. I've moved the logic to setters, and validated it on 
my local cluster too, and it still suppressed all the exceptions and warning 
logs in .out file. In addition, I added a test case to verify that the changed 
POJO setters/getters are working properly.

> Timeline server .out log have JAXB binding exceptions and warnings.
> ---
>
> Key: YARN-2804
> URL: https://issues.apache.org/jira/browse/YARN-2804
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Critical
> Attachments: YARN-2804.1.patch, YARN-2804.2.patch
>
>
> Unlike other daemon, timeline server binds JacksonJaxbJsonProvider to resolve 
> the resources. However, there are noises in .out log:
> {code}
> SEVERE: Failed to generate the schema for the JAX-B elements
> com.sun.xml.bind.v2.runtime.IllegalAnnotationsException: 2 counts of 
> IllegalAnnotationExceptions
> java.util.Map is an interface, and JAXB can't handle interfaces.
>   this problem is related to the following location:
>   at java.util.Map
>   at public java.util.Map 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities
> java.util.Map does not have a no-arg default constructor.
>   this problem is related to the following location:
>   at java.util.Map
>   at public java.util.Map 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities
>   at 
> com.sun.xml.bind.v2.runtime.IllegalAnnotationsException$Builder.check(IllegalAnnotationsException.java:106)
>   at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:489)
>   at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:319)
>   at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1170)
>   at 
> com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:145)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:248)
>   at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:235)
>   at javax.xml.bind.ContextFinder.find(ContextFinder.java:432)
>   at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:637)
>   at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:584)
>   at 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.buildModelAndSchemas(WadlGeneratorJAXBGrammarGenerator.java:412)
>   at 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.createExternalGrammar(WadlGeneratorJAXBGrammarGenerator.java:352)
>   at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:115)
>   at 
> com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104)
>   at 
> com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:120)
>   at 
> com.sun.jersey.server.impl.wadl.WadlMethodFactory$WadlOptionsMethodDispatcher.dispatch(WadlMethodFactory.java:98)
>   at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
>   at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.acc

[jira] [Commented] (YARN-2767) RM web services - add test case to ensure the http static user cannot kill or submit apps in secure mode

2014-11-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196717#comment-14196717
 ] 

Zhijie Shen commented on YARN-2767:
---

Sorry for not raising it early, but just notice a nit. It is using another 
class name as the dir, which may have problem if the two test cases rum 
simultaneously, it may have some conflicts.
{code}
+  private static final File testRootDir = new File("target",
+TestRMWebServicesDelegationTokenAuthentication.class.getName() + "-root");
{code}

> RM web services - add test case to ensure the http static user cannot kill or 
> submit apps in secure mode
> 
>
> Key: YARN-2767
> URL: https://issues.apache.org/jira/browse/YARN-2767
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2767.0.patch, apache-yarn-2767.1.patch, 
> apache-yarn-2767.2.patch
>
>
> We should add a test to ensure that the http static user used to access the 
> RM web interface can't submit or kill apps if the cluster is running in 
> secure mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2804) Timeline server .out log have JAXB binding exceptions and warnings.

2014-11-03 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2804:
--
Attachment: YARN-2804.1.patch

In the patch, I made a compromise when changing TimelineEntity and 
TimelineEvent, to ensure java API compatible as well as satisfy JAXB. For put 
domain response, I change to return an empty TimelinePutResponse instead of 
using Jersey Response.

After these changes, the exceptions and the warnings are gone from .out.

> Timeline server .out log have JAXB binding exceptions and warnings.
> ---
>
> Key: YARN-2804
> URL: https://issues.apache.org/jira/browse/YARN-2804
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Critical
> Attachments: YARN-2804.1.patch
>
>
> Unlike other daemon, timeline server binds JacksonJaxbJsonProvider to resolve 
> the resources. However, there are noises in .out log:
> {code}
> SEVERE: Failed to generate the schema for the JAX-B elements
> com.sun.xml.bind.v2.runtime.IllegalAnnotationsException: 2 counts of 
> IllegalAnnotationExceptions
> java.util.Map is an interface, and JAXB can't handle interfaces.
>   this problem is related to the following location:
>   at java.util.Map
>   at public java.util.Map 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities
> java.util.Map does not have a no-arg default constructor.
>   this problem is related to the following location:
>   at java.util.Map
>   at public java.util.Map 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities
>   at 
> com.sun.xml.bind.v2.runtime.IllegalAnnotationsException$Builder.check(IllegalAnnotationsException.java:106)
>   at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:489)
>   at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:319)
>   at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1170)
>   at 
> com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:145)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:248)
>   at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:235)
>   at javax.xml.bind.ContextFinder.find(ContextFinder.java:432)
>   at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:637)
>   at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:584)
>   at 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.buildModelAndSchemas(WadlGeneratorJAXBGrammarGenerator.java:412)
>   at 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.createExternalGrammar(WadlGeneratorJAXBGrammarGenerator.java:352)
>   at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:115)
>   at 
> com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104)
>   at 
> com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:120)
>   at 
> com.sun.jersey.server.impl.wadl.WadlMethodFactory$WadlOptionsMethodDispatcher.dispatch(WadlMethodFactory.java:98)
>   at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
>   at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>   at 
> com.sun.jersey.server.impl.uri.rules

[jira] [Commented] (YARN-2804) Timeline server .out log have JAXB binding exceptions and warnings.

2014-11-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195475#comment-14195475
 ] 

Zhijie Shen commented on YARN-2804:
---

If the map interface issue is resolved, another issue which didn't occur before 
will show up too:
{code}
java.lang.IllegalAccessException: Class
com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8
can not access a member of class javax.ws.rs.core.Response with
modifiers "protected"
 at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:65)
 at java.lang.Class.newInstance0(Class.java:349)
 at java.lang.Class.newInstance(Class.java:308)
 at 
com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8.resolve(WadlGeneratorJAXBGrammarGenerator.java:467)
 at 
com.sun.jersey.server.wadl.WadlGenerator$ExternalGrammarDefinition.resolve(WadlGenerator.java:181)
 at 
com.sun.jersey.server.wadl.ApplicationDescription.resolve(ApplicationDescription.java:81)
 at 
com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.attachTypes(WadlGeneratorJAXBGrammarGenerator.java:518)
 at 
com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:124)
 at 
com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104)
 at 
com.sun.jersey.server.impl.wadl.WadlResource.getWadl(WadlResource.java:89)
{code}

This needs to be fixed together to completely avoid the excessive log though it 
seems not to be necessary if we upgrade jersey (See 
[here|https://java.net/projects/jersey/lists/users/archive/2011-10/message/117])

> Timeline server .out log have JAXB binding exceptions and warnings.
> ---
>
> Key: YARN-2804
> URL: https://issues.apache.org/jira/browse/YARN-2804
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Critical
>
> Unlike other daemon, timeline server binds JacksonJaxbJsonProvider to resolve 
> the resources. However, there are noises in .out log:
> {code}
> SEVERE: Failed to generate the schema for the JAX-B elements
> com.sun.xml.bind.v2.runtime.IllegalAnnotationsException: 2 counts of 
> IllegalAnnotationExceptions
> java.util.Map is an interface, and JAXB can't handle interfaces.
>   this problem is related to the following location:
>   at java.util.Map
>   at public java.util.Map 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities
> java.util.Map does not have a no-arg default constructor.
>   this problem is related to the following location:
>   at java.util.Map
>   at public java.util.Map 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities
>   at 
> com.sun.xml.bind.v2.runtime.IllegalAnnotationsException$Builder.check(IllegalAnnotationsException.java:106)
>   at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:489)
>   at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:319)
>   at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1170)
>   at 
> com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:145)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:248)
>   at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:235)
>   at javax.xml.bind.ContextFinder.find(ContextFinder.jav

[jira] [Created] (YARN-2804) Timeline server .out log have JAXB binding exceptions and warnings.

2014-11-03 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2804:
-

 Summary: Timeline server .out log have JAXB binding exceptions and 
warnings.
 Key: YARN-2804
 URL: https://issues.apache.org/jira/browse/YARN-2804
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical


Unlike other daemon, timeline server binds JacksonJaxbJsonProvider to resolve 
the resources. However, there are noises in .out log:

{code}
SEVERE: Failed to generate the schema for the JAX-B elements
com.sun.xml.bind.v2.runtime.IllegalAnnotationsException: 2 counts of 
IllegalAnnotationExceptions
java.util.Map is an interface, and JAXB can't handle interfaces.
this problem is related to the following location:
at java.util.Map
at public java.util.Map 
org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo()
at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent
at public java.util.List 
org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents()
at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity
at public java.util.List 
org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities()
at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities
java.util.Map does not have a no-arg default constructor.
this problem is related to the following location:
at java.util.Map
at public java.util.Map 
org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo()
at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent
at public java.util.List 
org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents()
at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity
at public java.util.List 
org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities()
at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities

at 
com.sun.xml.bind.v2.runtime.IllegalAnnotationsException$Builder.check(IllegalAnnotationsException.java:106)
at 
com.sun.xml.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:489)
at 
com.sun.xml.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:319)
at 
com.sun.xml.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1170)
at 
com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:145)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:248)
at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:235)
at javax.xml.bind.ContextFinder.find(ContextFinder.java:432)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:637)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:584)
at 
com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.buildModelAndSchemas(WadlGeneratorJAXBGrammarGenerator.java:412)
at 
com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.createExternalGrammar(WadlGeneratorJAXBGrammarGenerator.java:352)
at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:115)
at 
com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104)
at 
com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:120)
at 
com.sun.jersey.server.impl.wadl.WadlMethodFactory$WadlOptionsMethodDispatcher.dispatch(WadlMethodFactory.java:98)
at 
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at 
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at 

[jira] [Comment Edited] (YARN-2798) YarnClient doesn't need to translate Kerberos name of timeline DT renewer

2014-11-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194982#comment-14194982
 ] 

Zhijie Shen edited comment on YARN-2798 at 11/3/14 8:03 PM:


I don't have a quick setup for RM HA and secure cluster, but the mapping rule 
is applied every where in this cluster, I think it should work fine.

In fact, this issue is not HA related problem. However, in general, if we want 
the DT renew to work across RMs, we have to run these RMs as the same operating 
user name. Otherwise, if DT renewer is set to yarn of RM1, and RM2 is run by 
yarn'. RM2 can no longer renew the DT. This is not applied just to timeline DT, 
but all the DTs that we assign RM to renew. Correct me if I'm wrong.


was (Author: zjshen):
I don't have a quick setup for RM HA and secure cluster, but the mapping rule 
is applied every where in this cluster, I think it should work fine.

> YarnClient doesn't need to translate Kerberos name of timeline DT renewer
> -
>
> Key: YARN-2798
> URL: https://issues.apache.org/jira/browse/YARN-2798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Arpit Gupta
>Assignee: Zhijie Shen
>Priority: Blocker
> Attachments: YARN-2798.1.patch, YARN-2798.2.patch
>
>
> Now YarnClient will automatically get a timeline DT when submitting an app in 
> a secure mode. It will try to parse the yarn-site.xml/core-site.xml to get 
> the RM daemon operating system user. However, the RM principal and 
> auth_to_local may not be properly presented to the client, and the client 
> cannot translate the principal to the daemon user properly. On the other 
> hand, AbstractDelegationTokenIdentifier will do this translation when create 
> the token. However, since the client has already translated the full 
> principal into a short user name (which may not be correct), the server can 
> no longer apply the translation any more, where RM principal and 
> auth_to_local are always correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2798) YarnClient doesn't need to translate Kerberos name of timeline DT renewer

2014-11-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194982#comment-14194982
 ] 

Zhijie Shen commented on YARN-2798:
---

I don't have a quick setup for RM HA and secure cluster, but the mapping rule 
is applied every where in this cluster, I think it should work fine.

> YarnClient doesn't need to translate Kerberos name of timeline DT renewer
> -
>
> Key: YARN-2798
> URL: https://issues.apache.org/jira/browse/YARN-2798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Arpit Gupta
>Assignee: Zhijie Shen
>Priority: Blocker
> Attachments: YARN-2798.1.patch, YARN-2798.2.patch
>
>
> Now YarnClient will automatically get a timeline DT when submitting an app in 
> a secure mode. It will try to parse the yarn-site.xml/core-site.xml to get 
> the RM daemon operating system user. However, the RM principal and 
> auth_to_local may not be properly presented to the client, and the client 
> cannot translate the principal to the daemon user properly. On the other 
> hand, AbstractDelegationTokenIdentifier will do this translation when create 
> the token. However, since the client has already translated the full 
> principal into a short user name (which may not be correct), the server can 
> no longer apply the translation any more, where RM principal and 
> auth_to_local are always correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2798) YarnClient doesn't need to translate Kerberos name of timeline DT renewer

2014-11-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194225#comment-14194225
 ] 

Zhijie Shen commented on YARN-2798:
---

Test failures are not related.

> YarnClient doesn't need to translate Kerberos name of timeline DT renewer
> -
>
> Key: YARN-2798
> URL: https://issues.apache.org/jira/browse/YARN-2798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Arpit Gupta
>Assignee: Zhijie Shen
>Priority: Blocker
> Attachments: YARN-2798.1.patch, YARN-2798.2.patch
>
>
> Now YarnClient will automatically get a timeline DT when submitting an app in 
> a secure mode. It will try to parse the yarn-site.xml/core-site.xml to get 
> the RM daemon operating system user. However, the RM principal and 
> auth_to_local may not be properly presented to the client, and the client 
> cannot translate the principal to the daemon user properly. On the other 
> hand, AbstractDelegationTokenIdentifier will do this translation when create 
> the token. However, since the client has already translated the full 
> principal into a short user name (which may not be correct), the server can 
> no longer apply the translation any more, where RM principal and 
> auth_to_local are always correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2798) YarnClient doesn't need to translate Kerberos name of timeline DT renewer

2014-11-02 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2798:
--
Attachment: YARN-2798.2.patch

bq. I don't understand why you are using timelineHost to resolve the renewer to 
be the ResourceManager.

Good catch! We should use rmHost.

In addition, it's not necessary to parse RM principal every time we request a 
timeline DT, move the logic of constructing the renewer to serviceInit.

> YarnClient doesn't need to translate Kerberos name of timeline DT renewer
> -
>
> Key: YARN-2798
> URL: https://issues.apache.org/jira/browse/YARN-2798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Arpit Gupta
>Assignee: Zhijie Shen
>Priority: Blocker
> Attachments: YARN-2798.1.patch, YARN-2798.2.patch
>
>
> Now YarnClient will automatically get a timeline DT when submitting an app in 
> a secure mode. It will try to parse the yarn-site.xml/core-site.xml to get 
> the RM daemon operating system user. However, the RM principal and 
> auth_to_local may not be properly presented to the client, and the client 
> cannot translate the principal to the daemon user properly. On the other 
> hand, AbstractDelegationTokenIdentifier will do this translation when create 
> the token. However, since the client has already translated the full 
> principal into a short user name (which may not be correct), the server can 
> no longer apply the translation any more, where RM principal and 
> auth_to_local are always correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-2798) YarnClient doesn't need to translate Kerberos name of timeline DT renewer

2014-11-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194071#comment-14194071
 ] 

Zhijie Shen edited comment on YARN-2798 at 11/2/14 11:50 PM:
-

Created patch to remove the translation logic from the client, and at the 
client side we just need to ensure _HOST is going to be mapped to the right 
timeline server. Add the test cases to verify the responsibility at both the 
client and server-side DT creating.

Please note that to make this work, core-site.xml that is presented to the 
timeline server should have proper auth_to_local configuration.


was (Author: zjshen):
Created patch to remove the translation logic from the client, and at the 
client side we just need to ensure _HOST is going to be mapped to the right 
timeline server. Add the test cases to verify the responsibility at both the 
client and server-side DT creating.

Please note that to make this work, core-site.xml and yarn-site.xml that are 
presented to the timeline server should have proper auth_to_local and rm 
principal configurations.

> YarnClient doesn't need to translate Kerberos name of timeline DT renewer
> -
>
> Key: YARN-2798
> URL: https://issues.apache.org/jira/browse/YARN-2798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Arpit Gupta
>Assignee: Zhijie Shen
>Priority: Blocker
> Attachments: YARN-2798.1.patch
>
>
> Now YarnClient will automatically get a timeline DT when submitting an app in 
> a secure mode. It will try to parse the yarn-site.xml/core-site.xml to get 
> the RM daemon operating system user. However, the RM principal and 
> auth_to_local may not be properly presented to the client, and the client 
> cannot translate the principal to the daemon user properly. On the other 
> hand, AbstractDelegationTokenIdentifier will do this translation when create 
> the token. However, since the client has already translated the full 
> principal into a short user name (which may not be correct), the server can 
> no longer apply the translation any more, where RM principal and 
> auth_to_local are always correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2785) TestContainerResourceUsage fails intermittently

2014-11-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194083#comment-14194083
 ] 

Zhijie Shen commented on YARN-2785:
---

Commit the patch to trunk, branch-2 and branch-2.6. Thanks Varun!

> TestContainerResourceUsage fails intermittently
> ---
>
> Key: YARN-2785
> URL: https://issues.apache.org/jira/browse/YARN-2785
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.6.0
>
> Attachments: apache-yarn-2785.0.patch, apache-yarn-2785.1.patch, 
> apache-yarn-2785.2.patch
>
>
> TestContainerResourceUsage fails sometimes due to the timeout values being 
> low.
> From the test failures - 
> {noformat}
> --
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage
> Tests run: 4, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 71.264 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResource
> testUsageWithMultipleContainersAndRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage)
>   Time elapsed: 60.032 sec  <<< ERROR!
> java.lang.Exception: test timed out after 6 milliseconds
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:209)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:198)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage.testUsageWithMultipleContainersAndRMRestart(TestContainerResourceUsage.java:
> testUsageWithOneAttemptAndOneContainer(org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage)
>   Time elapsed: 3.375 sec  <<< FAILURE!
> java.lang.AssertionError: While app is running, memory seconds should be >0 
> but is 0
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage.testUsageWithOneAttemptAndOneContainer(TestContainerResourceUsage.java:108)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2785) TestContainerResourceUsage fails intermittently

2014-11-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194074#comment-14194074
 ] 

Zhijie Shen commented on YARN-2785:
---

+1 will commit the patch

> TestContainerResourceUsage fails intermittently
> ---
>
> Key: YARN-2785
> URL: https://issues.apache.org/jira/browse/YARN-2785
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2785.0.patch, apache-yarn-2785.1.patch, 
> apache-yarn-2785.2.patch
>
>
> TestContainerResourceUsage fails sometimes due to the timeout values being 
> low.
> From the test failures - 
> {noformat}
> --
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage
> Tests run: 4, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 71.264 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResource
> testUsageWithMultipleContainersAndRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage)
>   Time elapsed: 60.032 sec  <<< ERROR!
> java.lang.Exception: test timed out after 6 milliseconds
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:209)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:198)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage.testUsageWithMultipleContainersAndRMRestart(TestContainerResourceUsage.java:
> testUsageWithOneAttemptAndOneContainer(org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage)
>   Time elapsed: 3.375 sec  <<< FAILURE!
> java.lang.AssertionError: While app is running, memory seconds should be >0 
> but is 0
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage.testUsageWithOneAttemptAndOneContainer(TestContainerResourceUsage.java:108)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2798) YarnClient doesn't need to translate Kerberos name of timeline DT renewer

2014-11-02 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2798:
--
Attachment: YARN-2798.1.patch

Created patch to remove the translation logic from the client, and at the 
client side we just need to ensure _HOST is going to be mapped to the right 
timeline server. Add the test cases to verify the responsibility at both the 
client and server-side DT creating.

Please note that to make this work, core-site.xml and yarn-site.xml that are 
presented to the timeline server should have proper auth_to_local and rm 
principal configurations.

> YarnClient doesn't need to translate Kerberos name of timeline DT renewer
> -
>
> Key: YARN-2798
> URL: https://issues.apache.org/jira/browse/YARN-2798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Arpit Gupta
>Assignee: Zhijie Shen
>Priority: Blocker
> Attachments: YARN-2798.1.patch
>
>
> Now YarnClient will automatically get a timeline DT when submitting an app in 
> a secure mode. It will try to parse the yarn-site.xml/core-site.xml to get 
> the RM daemon operating system user. However, the RM principal and 
> auth_to_local may not be properly presented to the client, and the client 
> cannot translate the principal to the daemon user properly. On the other 
> hand, AbstractDelegationTokenIdentifier will do this translation when create 
> the token. However, since the client has already translated the full 
> principal into a short user name (which may not be correct), the server can 
> no longer apply the translation any more, where RM principal and 
> auth_to_local are always correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2798) YarnClient doesn't need to translate Kerberos name of timeline DT renewer

2014-11-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193993#comment-14193993
 ] 

Zhijie Shen commented on YARN-2798:
---

Report the issue on behalf of [~arpitgupta]

> YarnClient doesn't need to translate Kerberos name of timeline DT renewer
> -
>
> Key: YARN-2798
> URL: https://issues.apache.org/jira/browse/YARN-2798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Arpit Gupta
>Assignee: Zhijie Shen
>Priority: Blocker
>
> Now YarnClient will automatically get a timeline DT when submitting an app in 
> a secure mode. It will try to parse the yarn-site.xml/core-site.xml to get 
> the RM daemon operating system user. However, the RM principal and 
> auth_to_local may not be properly presented to the client, and the client 
> cannot translate the principal to the daemon user properly. On the other 
> hand, AbstractDelegationTokenIdentifier will do this translation when create 
> the token. However, since the client has already translated the full 
> principal into a short user name (which may not be correct), the server can 
> no longer apply the translation any more, where RM principal and 
> auth_to_local are always correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2798) YarnClient doesn't need to translate Kerberos name of timeline DT renewer

2014-11-02 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2798:
--
Reporter: Arpit Gupta  (was: Zhijie Shen)

> YarnClient doesn't need to translate Kerberos name of timeline DT renewer
> -
>
> Key: YARN-2798
> URL: https://issues.apache.org/jira/browse/YARN-2798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Arpit Gupta
>Assignee: Zhijie Shen
>Priority: Blocker
>
> Now YarnClient will automatically get a timeline DT when submitting an app in 
> a secure mode. It will try to parse the yarn-site.xml/core-site.xml to get 
> the RM daemon operating system user. However, the RM principal and 
> auth_to_local may not be properly presented to the client, and the client 
> cannot translate the principal to the daemon user properly. On the other 
> hand, AbstractDelegationTokenIdentifier will do this translation when create 
> the token. However, since the client has already translated the full 
> principal into a short user name (which may not be correct), the server can 
> no longer apply the translation any more, where RM principal and 
> auth_to_local are always correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2798) YarnClient doesn't need to translate Kerberos name of timeline DT renewer

2014-11-02 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2798:
--
Component/s: timelineserver

> YarnClient doesn't need to translate Kerberos name of timeline DT renewer
> -
>
> Key: YARN-2798
> URL: https://issues.apache.org/jira/browse/YARN-2798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Blocker
>
> Now YarnClient will automatically get a timeline DT when submitting an app in 
> a secure mode. It will try to parse the yarn-site.xml/core-site.xml to get 
> the RM daemon operating system user. However, the RM principal and 
> auth_to_local may not be properly presented to the client, and the client 
> cannot translate the principal to the daemon user properly. On the other 
> hand, AbstractDelegationTokenIdentifier will do this translation when create 
> the token. However, since the client has already translated the full 
> principal into a short user name (which may not be correct), the server can 
> no longer apply the translation any more, where RM principal and 
> auth_to_local are always correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2798) YarnClient doesn't need to translate Kerberos name of timeline DT renewer

2014-11-02 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2798:
-

 Summary: YarnClient doesn't need to translate Kerberos name of 
timeline DT renewer
 Key: YARN-2798
 URL: https://issues.apache.org/jira/browse/YARN-2798
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker


Now YarnClient will automatically get a timeline DT when submitting an app in a 
secure mode. It will try to parse the yarn-site.xml/core-site.xml to get the RM 
daemon operating system user. However, the RM principal and auth_to_local may 
not be properly presented to the client, and the client cannot translate the 
principal to the daemon user properly. On the other hand, 
AbstractDelegationTokenIdentifier will do this translation when create the 
token. However, since the client has already translated the full principal into 
a short user name (which may not be correct), the server can no longer apply 
the translation any more, where RM principal and auth_to_local are always 
correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2752) TestContainerExecutor.testRunCommandNoPriority fails in branch-2

2014-11-01 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193494#comment-14193494
 ] 

Zhijie Shen commented on YARN-2752:
---

The fix makes sense. Currently the branch-2 code will always add "nice -n" 
argument whether the priority is using the default 0 or a user customized value.

One suggestion: maybe it's better to apply the diff between trunk and branch-2 
here. It prevents the merge failure if we modify this code on the trunk, and 
cherry pick it to branch-2 in the future.

> TestContainerExecutor.testRunCommandNoPriority fails in branch-2
> 
>
> Key: YARN-2752
> URL: https://issues.apache.org/jira/browse/YARN-2752
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2752.1-branch-2.patch
>
>
> TestContainerExecutor.testRunCommandNoPriority fails in branch-2. But it 
> passed in trunk. 
> The function code ContainerExecutor.getRunCommand() in trunk is different 
> from that in branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2785) TestContainerResourceUsage fails intermittently

2014-11-01 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193449#comment-14193449
 ] 

Zhijie Shen commented on YARN-2785:
---

Why don't we need to prolong timeout for testUsageWithOneAttemptAndOneContainer 
too?
{code}
-  @Test (timeout = 6)
+  @Test (timeout = 12)
   public void testUsageWithMultipleContainersAndRMRestart() throws Exception {
{code}

> TestContainerResourceUsage fails intermittently
> ---
>
> Key: YARN-2785
> URL: https://issues.apache.org/jira/browse/YARN-2785
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2785.0.patch, apache-yarn-2785.1.patch
>
>
> TestContainerResourceUsage fails sometimes due to the timeout values being 
> low.
> From the test failures - 
> {noformat}
> --
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage
> Tests run: 4, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 71.264 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResource
> testUsageWithMultipleContainersAndRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage)
>   Time elapsed: 60.032 sec  <<< ERROR!
> java.lang.Exception: test timed out after 6 milliseconds
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:209)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:198)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage.testUsageWithMultipleContainersAndRMRestart(TestContainerResourceUsage.java:
> testUsageWithOneAttemptAndOneContainer(org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage)
>   Time elapsed: 3.375 sec  <<< FAILURE!
> java.lang.AssertionError: While app is running, memory seconds should be >0 
> but is 0
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage.testUsageWithOneAttemptAndOneContainer(TestContainerResourceUsage.java:108)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2783) TestApplicationClientProtocolOnHA fails on trunk intermittently

2014-11-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2783:
--
Summary: TestApplicationClientProtocolOnHA fails on trunk intermittently  
(was: TestApplicationClientProtocolOnHA)

> TestApplicationClientProtocolOnHA fails on trunk intermittently
> ---
>
> Key: YARN-2783
> URL: https://issues.apache.org/jira/browse/YARN-2783
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Zhijie Shen
>
> {code}
> Running org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
> Tests run: 17, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 147.881 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
> testGetContainersOnHA(org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA)
>   Time elapsed: 12.928 sec  <<< ERROR!
> java.net.ConnectException: Call From asf905.gq1.ygridcore.net/67.195.81.149 
> to asf905.gq1.ygridcore.net:28032 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
>   at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1438)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>   at com.sun.proxy.$Proxy17.getContainers(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getContainers(ApplicationClientProtocolPBClientImpl.java:400)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
>   at com.sun.proxy.$Proxy18.getContainers(Unknown Source)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getContainers(YarnClientImpl.java:639)
>   at 
> org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA.testGetContainersOnHA(TestApplicationClientProtocolOnHA.java:154)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2785) TestContainerResourceUsage fails intermittently

2014-10-31 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192892#comment-14192892
 ] 

Zhijie Shen commented on YARN-2785:
---

Add more time should a solution for slow computers. My question is whether all 
test cases in TestContainerResourceUsage are subject to timeout? And it seems 
that both testUsageWithOneAttemptAndOneContainer and 
testUsageWithMultipleContainersAndRMRestart need to sleep to let metrics to 
accumulate. Hence should the fix be applied to all the test cases here?

> TestContainerResourceUsage fails intermittently
> ---
>
> Key: YARN-2785
> URL: https://issues.apache.org/jira/browse/YARN-2785
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2785.0.patch
>
>
> TestContainerResourceUsage fails sometimes due to the timeout values being 
> low.
> From the test failures - 
> {noformat}
> --
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage
> Tests run: 4, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 71.264 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResource
> testUsageWithMultipleContainersAndRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage)
>   Time elapsed: 60.032 sec  <<< ERROR!
> java.lang.Exception: test timed out after 6 milliseconds
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:209)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:198)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage.testUsageWithMultipleContainersAndRMRestart(TestContainerResourceUsage.java:
> testUsageWithOneAttemptAndOneContainer(org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage)
>   Time elapsed: 3.375 sec  <<< FAILURE!
> java.lang.AssertionError: While app is running, memory seconds should be >0 
> but is 0
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage.testUsageWithOneAttemptAndOneContainer(TestContainerResourceUsage.java:108)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2711) TestDefaultContainerExecutor#testContainerLaunchError fails on Windows

2014-10-31 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2711:
--
Issue Type: Test  (was: Bug)

> TestDefaultContainerExecutor#testContainerLaunchError fails on Windows
> --
>
> Key: YARN-2711
> URL: https://issues.apache.org/jira/browse/YARN-2711
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.6.0
>
> Attachments: apache-yarn-2711.0.patch
>
>
> The testContainerLaunchError test fails on Windows with the following error -
> {noformat}
> java.io.FileNotFoundException: File file:/bin/echo does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
>   at org.apache.hadoop.fs.FilterFs.getFileStatus(FilterFs.java:120)
>   at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1117)
>   at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1113)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1113)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2019)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1978)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:145)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestDefaultContainerExecutor.testContainerLaunchError(TestDefaultContainerExecutor.java:289)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2711) TestDefaultContainerExecutor#testContainerLaunchError fails on Windows

2014-10-31 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192829#comment-14192829
 ] 

Zhijie Shen commented on YARN-2711:
---

Junping is offline and has network issue with git repository. I'll go ahead to 
commit the patch.

> TestDefaultContainerExecutor#testContainerLaunchError fails on Windows
> --
>
> Key: YARN-2711
> URL: https://issues.apache.org/jira/browse/YARN-2711
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2711.0.patch
>
>
> The testContainerLaunchError test fails on Windows with the following error -
> {noformat}
> java.io.FileNotFoundException: File file:/bin/echo does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
>   at org.apache.hadoop.fs.FilterFs.getFileStatus(FilterFs.java:120)
>   at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1117)
>   at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1113)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1113)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2019)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1978)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:145)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestDefaultContainerExecutor.testContainerLaunchError(TestDefaultContainerExecutor.java:289)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2767) RM web services - add test case to ensure the http static user can kill or submit apps in secure mode

2014-10-31 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192761#comment-14192761
 ] 

Zhijie Shen commented on YARN-2767:
---

[~vvasudev], thanks for the patch. The test cases look good. Just some minor 
comments for code refactoring:

1. Use Assert.fail()?
{code}
  assertTrue("Couldn't create MiniKDC", false);
{code}

2. miniKDCStarted is not necessary.
{code}
  miniKDCStarted = true;
{code}

3. It seems not to be necessary. Maybe we can refactor the code of setUp().
{code}
  private static MiniKdc getKdc() {
return testMiniKDC;
  }
{code}

> RM web services - add test case to ensure the http static user can kill or 
> submit apps in secure mode
> -
>
> Key: YARN-2767
> URL: https://issues.apache.org/jira/browse/YARN-2767
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2767.0.patch, apache-yarn-2767.1.patch
>
>
> We should add a test to ensure that the http static user used to access the 
> RM web interface can't submit or kill apps if the cluster is running in 
> secure mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2770) Timeline delegation tokens need to be automatically renewed by the RM

2014-10-30 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14191308#comment-14191308
 ] 

Zhijie Shen commented on YARN-2770:
---

The two test failures are not related, and happen on other Jiras, too: file two 
tickets for them - YARN-2782 an YARN-2783.

> Timeline delegation tokens need to be automatically renewed by the RM
> -
>
> Key: YARN-2770
> URL: https://issues.apache.org/jira/browse/YARN-2770
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.5.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Critical
> Attachments: YARN-2770.1.patch, YARN-2770.2.patch
>
>
> YarnClient will automatically grab a timeline DT for the application and pass 
> it to the app AM. Now the timeline DT renew is still dummy. If an app is 
> running for more than 24h (default DT expiry time), the app AM is no longer 
> able to use the expired DT to communicate with the timeline server. Since RM 
> will cache the credentials of each app, and renew the DTs for the running 
> app. We should provider renew hooks similar to what HDFS DT has for RM, and 
> set RM user as the renewer when grabbing the timeline DT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2783) TestApplicationClientProtocolOnHA

2014-10-30 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2783:
-

 Summary: TestApplicationClientProtocolOnHA
 Key: YARN-2783
 URL: https://issues.apache.org/jira/browse/YARN-2783
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen


{code}
Running org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
Tests run: 17, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 147.881 sec 
<<< FAILURE! - in 
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
testGetContainersOnHA(org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA)
  Time elapsed: 12.928 sec  <<< ERROR!
java.net.ConnectException: Call From asf905.gq1.ygridcore.net/67.195.81.149 to 
asf905.gq1.ygridcore.net:28032 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
at org.apache.hadoop.ipc.Client.call(Client.java:1438)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy17.getContainers(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getContainers(ApplicationClientProtocolPBClientImpl.java:400)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
at com.sun.proxy.$Proxy18.getContainers(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getContainers(YarnClientImpl.java:639)
at 
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA.testGetContainersOnHA(TestApplicationClientProtocolOnHA.java:154)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2782) TestResourceTrackerOnHA fails on trunk

2014-10-30 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2782:
-

 Summary: TestResourceTrackerOnHA fails on trunk
 Key: YARN-2782
 URL: https://issues.apache.org/jira/browse/YARN-2782
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen


{code}
Running org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 12.684 sec <<< 
FAILURE! - in org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
testResourceTrackerOnHA(org.apache.hadoop.yarn.client.TestResourceTrackerOnHA)  
Time elapsed: 12.518 sec  <<< ERROR!
java.net.ConnectException: Call From asf905.gq1.ygridcore.net/67.195.81.149 to 
asf905.gq1.ygridcore.net:28031 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
at org.apache.hadoop.ipc.Client.call(Client.java:1438)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy87.registerNodeManager(Unknown Source)
at 
org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
at com.sun.proxy.$Proxy88.registerNodeManager(Unknown Source)
at 
org.apache.hadoop.yarn.client.TestResourceTrackerOnHA.testResourceTrackerOnHA(TestResourceTrackerOnHA.java:64)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2779) SystemMetricsPublisher can use Kerberos directly instead of timeline DT

2014-10-30 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14191177#comment-14191177
 ] 

Zhijie Shen commented on YARN-2779:
---

I've verified it in a secure cluster, and SystemMetricsPublisher works fine 
with kerberos directly.

> SystemMetricsPublisher can use Kerberos directly instead of timeline DT
> ---
>
> Key: YARN-2779
> URL: https://issues.apache.org/jira/browse/YARN-2779
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Critical
> Attachments: YARN-2779.1.patch
>
>
> SystemMetricsPublisher is going to grab a timeline DT in secure mode as well. 
> The timeline DT will expiry after 24h. No DT renewer will handle renewing 
> work for SystemMetricsPublisher, but this has to been handled by itself. In 
> addition, SystemMetricsPublisher should cancel the timeline DT when it is 
> stopped, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2770) Timeline delegation tokens need to be automatically renewed by the RM

2014-10-30 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2770:
--
Attachment: YARN-2770.2.patch

> Timeline delegation tokens need to be automatically renewed by the RM
> -
>
> Key: YARN-2770
> URL: https://issues.apache.org/jira/browse/YARN-2770
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.5.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Critical
> Attachments: YARN-2770.1.patch, YARN-2770.2.patch
>
>
> YarnClient will automatically grab a timeline DT for the application and pass 
> it to the app AM. Now the timeline DT renew is still dummy. If an app is 
> running for more than 24h (default DT expiry time), the app AM is no longer 
> able to use the expired DT to communicate with the timeline server. Since RM 
> will cache the credentials of each app, and renew the DTs for the running 
> app. We should provider renew hooks similar to what HDFS DT has for RM, and 
> set RM user as the renewer when grabbing the timeline DT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2770) Timeline delegation tokens need to be automatically renewed by the RM

2014-10-30 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14191174#comment-14191174
 ] 

Zhijie Shen commented on YARN-2770:
---

bq. SecurityUtil#getServerPrincipal may be useful.
bq. Let's make sure the renewer name mangling imitates MR JobClient, it is easy 
to get this wrong.

I think we should use HadoopKerberosName#getShortName 
(AbstractDelegationTokenSecretManager is using it as well) and RM_Principal 
(which should be there in secure mode) to get the RM daemon user, and 
HadoopKerberosName will automatically handle auth_to_local if we need to map 
the auth name to the real operating system name.

bq. It'll be great to also test separately that renewal can work fine when 
https is enabled.

I've verified it will work with SSL. BTW, SystemMetricsPublisher works fine 
with SSL too. To make it work, we must make sure RM have seen the proper 
configuration for SSL and the truststore.

bq. the same DelegationTokenAuthenticatedURL is instantiated multiple times, is 
it possible to store it as a variable ?

It's probably okay to reuse DelegationTokenAuthenticatedURL. However, I'd like 
to construct one for each request to isolate the possible resource sharing, 
preventing introducing potential bugs. Actually Jersey client also construct a 
new URL for each request. It won't be a big overhead, as it doesn't deeply 
construct something.

bq. similarly for the timeline client instantiation.

I'm not sure, but guess you're talking about TokenRenewer. Actually I'm 
following the way that RMDelegationTokenIdentifier does. If we don't construct 
the client per call, we need to make it a service, and have separate stage for 
init/start and stop. It may complex the change. Please let me know if you want 
this change.

bq. We may replace the token after renew is really succeeded.

According to the design of DelegationTokenAuthenticatedURL, I need to put the 
DT into the current DelegationTokenAuthenticatedURL.Token, which will be 
fetched internally to do the corresponding operations. So to renew a given DT, 
I need to set DT there. However, if it already cached there, the client can 
skip the set step.

Otherwise, I've addressed the remaining comments. Thanks Jian and Vinod!

> Timeline delegation tokens need to be automatically renewed by the RM
> -
>
> Key: YARN-2770
> URL: https://issues.apache.org/jira/browse/YARN-2770
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.5.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Critical
> Attachments: YARN-2770.1.patch
>
>
> YarnClient will automatically grab a timeline DT for the application and pass 
> it to the app AM. Now the timeline DT renew is still dummy. If an app is 
> running for more than 24h (default DT expiry time), the app AM is no longer 
> able to use the expired DT to communicate with the timeline server. Since RM 
> will cache the credentials of each app, and renew the DTs for the running 
> app. We should provider renew hooks similar to what HDFS DT has for RM, and 
> set RM user as the renewer when grabbing the timeline DT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2779) SystemMetricsPublisher can use Kerberos directly instead of timeline DT

2014-10-30 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2779:
--
Attachment: YARN-2779.1.patch

Upload a patch to remove the code of getting the timeline DT in the 
SystemMetricsPublisher

> SystemMetricsPublisher can use Kerberos directly instead of timeline DT
> ---
>
> Key: YARN-2779
> URL: https://issues.apache.org/jira/browse/YARN-2779
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Critical
> Attachments: YARN-2779.1.patch
>
>
> SystemMetricsPublisher is going to grab a timeline DT in secure mode as well. 
> The timeline DT will expiry after 24h. No DT renewer will handle renewing 
> work for SystemMetricsPublisher, but this has to been handled by itself. In 
> addition, SystemMetricsPublisher should cancel the timeline DT when it is 
> stopped, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2771) DistributedShell's DSConstants are badly named

2014-10-30 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2771:
--
Attachment: YARN-2771.3.patch

Fix the test failure

> DistributedShell's DSConstants are badly named
> --
>
> Key: YARN-2771
> URL: https://issues.apache.org/jira/browse/YARN-2771
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: YARN-2771.1.patch, YARN-2771.2.patch, YARN-2771.3.patch
>
>
> I'd rather have underscores (DISTRIBUTED_SHELL_TIMELINE_DOMAIN instead of 
> DISTRIBUTEDSHELLTIMELINEDOMAIN).
> DISTRIBUTEDSHELLTIMELINEDOMAIN is added in this release, can we rename it to 
> be DISTRIBUTED_SHELL_TIMELINE_DOMAIN?
> For the old envs, we can just add new envs that point to the old-one and 
> deprecate the old ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2779) SystemMetricsPublisher can use Kerberos directly instead of timeline DT

2014-10-30 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2779:
--
Summary: SystemMetricsPublisher can use Kerberos directly instead of 
timeline DT  (was: SystemMetricsPublisher needs to renew and cancel timeline DT 
too)

> SystemMetricsPublisher can use Kerberos directly instead of timeline DT
> ---
>
> Key: YARN-2779
> URL: https://issues.apache.org/jira/browse/YARN-2779
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Critical
>
> SystemMetricsPublisher is going to grab a timeline DT in secure mode as well. 
> The timeline DT will expiry after 24h. No DT renewer will handle renewing 
> work for SystemMetricsPublisher, but this has to been handled by itself. In 
> addition, SystemMetricsPublisher should cancel the timeline DT when it is 
> stopped, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2779) SystemMetricsPublisher needs to renew and cancel timeline DT too

2014-10-30 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190955#comment-14190955
 ] 

Zhijie Shen commented on YARN-2779:
---

[~vinodkv], in the current code base, we're SystemMetricsPublisher to grab a 
timeline DT to talk to the timeline server in secure mode. That's why we need 
this Jira to add renew and cancel work.

But thinking of this issue again, it should be okay to let RM talk to the 
timeline server with kerberos directly. As this is the only process, which will 
not add too much workload to the kerberos server. So on the other hand, let's 
remove the getting DT logic, and let RM uses kerberos directly.

> SystemMetricsPublisher needs to renew and cancel timeline DT too
> 
>
> Key: YARN-2779
> URL: https://issues.apache.org/jira/browse/YARN-2779
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Critical
>
> SystemMetricsPublisher is going to grab a timeline DT in secure mode as well. 
> The timeline DT will expiry after 24h. No DT renewer will handle renewing 
> work for SystemMetricsPublisher, but this has to been handled by itself. In 
> addition, SystemMetricsPublisher should cancel the timeline DT when it is 
> stopped, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-2779) SystemMetricsPublisher needs to renew and cancel timeline DT too

2014-10-30 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reopened YARN-2779:
---

> SystemMetricsPublisher needs to renew and cancel timeline DT too
> 
>
> Key: YARN-2779
> URL: https://issues.apache.org/jira/browse/YARN-2779
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Critical
>
> SystemMetricsPublisher is going to grab a timeline DT in secure mode as well. 
> The timeline DT will expiry after 24h. No DT renewer will handle renewing 
> work for SystemMetricsPublisher, but this has to been handled by itself. In 
> addition, SystemMetricsPublisher should cancel the timeline DT when it is 
> stopped, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2779) SystemMetricsPublisher needs to renew and cancel timeline DT too

2014-10-30 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2779:
-

 Summary: SystemMetricsPublisher needs to renew and cancel timeline 
DT too
 Key: YARN-2779
 URL: https://issues.apache.org/jira/browse/YARN-2779
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, timelineserver
Affects Versions: 2.6.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical


SystemMetricsPublisher is going to grab a timeline DT in secure mode as well. 
The timeline DT will expiry after 24h. No DT renewer will handle renewing work 
for SystemMetricsPublisher, but this has to been handled by itself. In 
addition, SystemMetricsPublisher should cancel the timeline DT when it is 
stopped, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    5   6   7   8   9   10   11   12   13   14   >