[jira] [Commented] (YARN-11010) YARN ui2 hangs on the Queues page when the scheduler response contains NaN values
[ https://issues.apache.org/jira/browse/YARN-11010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845558#comment-17845558 ] Xie YiFan commented on YARN-11010: -- [~tdomok] [~bkosztolnik] hi, I try to reproduce this issue in trunk branch and find that it already unable to use percentage mode on leaf queue when set parent in absolute mode. But i can reproduce this issue in another easy way. Reproduction steps: # config DominantResourceCalculator # set one child queue with [memory=0,vcores=0] in absolute mode The root cause is divide zero by zero which return NaN as a result. YARN-9019 fixed NaN/Infinity issue with ratio function in DefaultResourceCalculator and DominantResourceCalculator. The DefaultResourceCalculator.divide is implemented by ratio function but DominantResourceCalculator.divide is not. So DominantResourceCalculator.divide function may return Nan result. > YARN ui2 hangs on the Queues page when the scheduler response contains NaN > values > - > > Key: YARN-11010 > URL: https://issues.apache.org/jira/browse/YARN-11010 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.4.0 >Reporter: Tamas Domok >Assignee: Xie YiFan >Priority: Major > Attachments: capacity-scheduler.xml, shresponse.json > > > When the scheduler response contains NaN values for capacity and maxCapacity > the UI2 hangs on the Queues page. The console log shows the following error: > {code:java} > SyntaxError: Unexpected token N in JSON at position 666 {code} > The scheduler response: > {code:java} > "maxCapacity": NaN, > "absoluteMaxCapacity": NaN, {code} > NaN, infinity, -infinity is not valid in JSON syntax: > https://www.json.org/json-en.html > This might be related as well: YARN-10452 > > I managed to reproduce this with AQCv1, where I set the parent queue's > capacity in absolute mode, then I used percentage mode on the > leaf-queue-template. I'm not sure if this is a valid configuration, however > there is no error or warning in RM logs about any configuration error. To > trigger the issue the DominantResourceCalculator must be used. (When using > absolute mode on the leaf-queue-template this issue is not re-producible, > further details on: YARN-10922). > > Reproduction steps: > # Start the cluster with the attached configuration > # Check the Queues page on UI2 (it should work at this point) > # Send an example job (yarn jar hadoop-mapreduce-examples-3.4.0-SNAPSHOT.jar > pi 1 10) > # Check the Queues page on UI2 (it should not be working at this point) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11010) YARN ui2 hangs on the Queues page when the scheduler response contains NaN values
[ https://issues.apache.org/jira/browse/YARN-11010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844087#comment-17844087 ] Xie YiFan commented on YARN-11010: -- [~tdomok] hi, are u working on this bug? if no, would you mind I take over? > YARN ui2 hangs on the Queues page when the scheduler response contains NaN > values > - > > Key: YARN-11010 > URL: https://issues.apache.org/jira/browse/YARN-11010 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.4.0 >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Attachments: capacity-scheduler.xml, shresponse.json > > > When the scheduler response contains NaN values for capacity and maxCapacity > the UI2 hangs on the Queues page. The console log shows the following error: > {code:java} > SyntaxError: Unexpected token N in JSON at position 666 {code} > The scheduler response: > {code:java} > "maxCapacity": NaN, > "absoluteMaxCapacity": NaN, {code} > NaN, infinity, -infinity is not valid in JSON syntax: > https://www.json.org/json-en.html > This might be related as well: YARN-10452 > > I managed to reproduce this with AQCv1, where I set the parent queue's > capacity in absolute mode, then I used percentage mode on the > leaf-queue-template. I'm not sure if this is a valid configuration, however > there is no error or warning in RM logs about any configuration error. To > trigger the issue the DominantResourceCalculator must be used. (When using > absolute mode on the leaf-queue-template this issue is not re-producible, > further details on: YARN-10922). > > Reproduction steps: > # Start the cluster with the attached configuration > # Check the Queues page on UI2 (it should work at this point) > # Send an example job (yarn jar hadoop-mapreduce-examples-3.4.0-SNAPSHOT.jar > pi 1 10) > # Check the Queues page on UI2 (it should not be working at this point) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11644) LogAggregationService can't upload log in time when application finished
[ https://issues.apache.org/jira/browse/YARN-11644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xie YiFan updated YARN-11644: - Affects Version/s: 3.3.6 > LogAggregationService can't upload log in time when application finished > > > Key: YARN-11644 > URL: https://issues.apache.org/jira/browse/YARN-11644 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation >Affects Versions: 3.3.6 >Reporter: Xie YiFan >Assignee: Xie YiFan >Priority: Minor > Attachments: image-2024-01-10-11-03-57-553.png > > > LogAggregationService is responsible for uploading log to HDFS. It applies > thread pool to execute upload task. > The workflow of upload log as follow: > # NM construct Applicaiton object when first container of a certain > application launch, then notify LogAggregationService to init > AppLogAggregationImpl. > # LogAggregationService submit AppLogAggregationImpl to task queue > # The idle worker of thread pool pulls AppLogAggregationImpl from task queue. > # AppLogAggregationImpl do while loop to check the application state, do > upload when application finished. > Suppose the following scenario: > * LogAggregationService initialize thread pool with 4 threads. > * 4 long running applications start on this NM, so all threads are occupied > by aggregator. > * The next short application starts on this NM and quickly finish, but no > idle thread for this app to upload log. > as a result, the following applications have to wait the previous > applications finish before uploading their logs. > !image-2024-01-10-11-03-57-553.png|width=599,height=195! > h4. Solution > Change the spin behavior of AppLogAggregationImpl. If application has not > finished, just return to yield current thread and resubmit itself to executor > service. So the LogAggregationService can roll the task queue and the logs of > finished application can be uploaded immediately. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11644) LogAggregationService can't upload log in time when application finished
[ https://issues.apache.org/jira/browse/YARN-11644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xie YiFan updated YARN-11644: - Description: LogAggregationService is responsible for uploading log to HDFS. It applies thread pool to execute upload task. The workflow of upload log as follow: # NM construct Applicaiton object when first container of a certain application launch, then notify LogAggregationService to init AppLogAggregationImpl. # LogAggregationService submit AppLogAggregationImpl to task queue # The idle worker of thread pool pulls AppLogAggregationImpl from task queue. # AppLogAggregationImpl do while loop to check the application state, do upload when application finished. Suppose the following scenario: * LogAggregationService initialize thread pool with 4 threads. * 4 long running applications start on this NM, so all threads are occupied by aggregator. * The next short application starts on this NM and quickly finish, but no idle thread for this app to upload log. as a result, the following applications have to wait the previous applications finish before uploading their logs. !image-2024-01-10-11-03-57-553.png|width=599,height=195! h4. Solution Change the spin behavior of AppLogAggregationImpl. If application has not finished, just return to yield current thread and resubmit itself to executor service. So the LogAggregationService can roll the task queue and the logs of finished application can be uploaded immediately. was: LogAggregationService is responsible for uploading log to HDFS. It applies thread pool to execute upload task. The workflow of upload log as follow: # NM construct Applicaiton object when first container of a certain application launch, then notify LogAggregationService to init AppLogAggregationImpl. # LogAggregationService submit AppLogAggregationImpl to task queue. # The idle worker of thread pool pulls AppLogAggregationImpl from task queue. # AppLogAggregationImpl do while loop to check the application state, do upload when application finished. Suppose the following scenario: * LogAggregationService initialize thread pool with 4 threads. * 4 long running applications start on this NM, so all threads are occupied by aggregator. * The next short application starts on this NM and quickly finish, but no idle thread for this app to upload log. as a result, the following applications have to wait the previous applications finish before uploading their logs. !image-2024-01-10-11-03-57-553.png|width=599,height=195! h4. Solution Change the spin behavior of AppLogAggregationImpl. If application has not finished, just return to yield current thread and resubmit itself to executor service. So the LogAggregationService can roll the task queue and the logs of finished application can be uploaded immediately. > LogAggregationService can't upload log in time when application finished > > > Key: YARN-11644 > URL: https://issues.apache.org/jira/browse/YARN-11644 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation >Reporter: Xie YiFan >Assignee: Xie YiFan >Priority: Minor > Attachments: image-2024-01-10-11-03-57-553.png > > > LogAggregationService is responsible for uploading log to HDFS. It applies > thread pool to execute upload task. > The workflow of upload log as follow: > # NM construct Applicaiton object when first container of a certain > application launch, then notify LogAggregationService to init > AppLogAggregationImpl. > # LogAggregationService submit AppLogAggregationImpl to task queue > # The idle worker of thread pool pulls AppLogAggregationImpl from task queue. > # AppLogAggregationImpl do while loop to check the application state, do > upload when application finished. > Suppose the following scenario: > * LogAggregationService initialize thread pool with 4 threads. > * 4 long running applications start on this NM, so all threads are occupied > by aggregator. > * The next short application starts on this NM and quickly finish, but no > idle thread for this app to upload log. > as a result, the following applications have to wait the previous > applications finish before uploading their logs. > !image-2024-01-10-11-03-57-553.png|width=599,height=195! > h4. Solution > Change the spin behavior of AppLogAggregationImpl. If application has not > finished, just return to yield current thread and resubmit itself to executor > service. So the LogAggregationService can roll the task queue and the logs of > finished application can be uploaded immediately. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoo
[jira] [Created] (YARN-11644) LogAggregationService can't upload log in time when application finished
Xie YiFan created YARN-11644: Summary: LogAggregationService can't upload log in time when application finished Key: YARN-11644 URL: https://issues.apache.org/jira/browse/YARN-11644 Project: Hadoop YARN Issue Type: Improvement Components: log-aggregation Reporter: Xie YiFan Assignee: Xie YiFan Attachments: image-2024-01-10-11-03-57-553.png LogAggregationService is responsible for uploading log to HDFS. It applies thread pool to execute upload task. The workflow of upload log as follow: # NM construct Applicaiton object when first container of a certain application launch, then notify LogAggregationService to init AppLogAggregationImpl. # LogAggregationService submit AppLogAggregationImpl to task queue. # The idle worker of thread pool pulls AppLogAggregationImpl from task queue. # AppLogAggregationImpl do while loop to check the application state, do upload when application finished. Suppose the following scenario: * LogAggregationService initialize thread pool with 4 threads. * 4 long running applications start on this NM, so all threads are occupied by aggregator. * The next short application starts on this NM and quickly finish, but no idle thread for this app to upload log. as a result, the following applications have to wait the previous applications finish before uploading their logs. !image-2024-01-10-11-03-57-553.png|width=599,height=195! h4. Solution Change the spin behavior of AppLogAggregationImpl. If application has not finished, just return to yield current thread and resubmit itself to executor service. So the LogAggregationService can roll the task queue and the logs of finished application can be uploaded immediately. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11643) Skip unnecessary pre-check in Multi Node Placement
Xie YiFan created YARN-11643: Summary: Skip unnecessary pre-check in Multi Node Placement Key: YARN-11643 URL: https://issues.apache.org/jira/browse/YARN-11643 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Xie YiFan Assignee: Xie YiFan When Multi Node Placement enabled, RegularContainerAllocator do a while loop to find one node from candidate set to allocate for a given scheduler key. Before do allocate, pre-check be called to check if current node satisfies check. If this node does not pass all checks, just continue to next node. {code:java} if (reservedContainer == null) { result = preCheckForNodeCandidateSet(node, schedulingMode, resourceLimits, schedulerKey); if (null != result) { continue; } } {code} But some checks are related to scheduler Key or Application which return PRIORITY_SKIPPED or APP_SKIPPED. It means that if first node does not pass check, the following nodes also do not pass. If cluster have 5000 nodes in default partition, Scheduler will waste 5000 times loop for just one scheduler key. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2082) Support for alternative log aggregation mechanism
[ https://issues.apache.org/jira/browse/YARN-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796555#comment-17796555 ] Xie YiFan commented on YARN-2082: - Hi, it is a long ticket. We are now suffering from the small files problem. We have 200,000+ jobs per day for one cluster. Suppose that a job may running on 25 nodemanagers at average. Then the amount of file number should be 200,000 times 25 = 5,000,000 for only one cluster per day. HDFS can't handle so many small files. At the create time of this ticket, we do not introduce timeline V2 which using HBase as backend storage. And now, timeline V2 have good scalability and usability. So i think we can using HBase to store log files. [~slfan1989] [~inigoiri] What do you think about this? > Support for alternative log aggregation mechanism > - > > Key: YARN-2082 > URL: https://issues.apache.org/jira/browse/YARN-2082 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Ming Ma >Priority: Major > > I will post a more detailed design later. Here is the brief summary and would > like to get early feedback. > Problem Statement: > Current implementation of log aggregation create one HDFS file for each > {application, nodemanager }. These files are relative small, in the range of > 1-2 MB. In a large cluster with lots of application and many nodemanagers, it > ends up creating lots of small files in HDFS. This creates pressure on HDFS > NN on the following ways. > 1. It increases NN Memory size. It is mitigated by having history server > deletes old log files in HDFS. > 2. Runtime RPC hit on HDFS. Each log aggregation file introduced several NN > RPCs such as create, getAdditionalBlock, complete, rename. When the cluster > is busy, such RPC hit has impact on NN performance. > In addition, to support non-MR applications on YARN, we might need to support > aggregation for long running applications. > Design choices: > 1. Don't aggregate all the logs, as in YARN-221. > 2. Create a dedicated HDFS namespace used only for log aggregation. > 3. Write logs to some key-value store like HBase. HBase's RPC hit on NN will > be much less. > 4. Decentralize the application level log aggregation to NMs. All logs for a > given application are aggregated first by a dedicated NM before it is pushed > to HDFS. > 5. Have NM aggregate logs on a regular basis; each of these log files will > have data from different applications and there needs to be some index for > quick lookup. > Proposal: > 1. Make yarn log aggregation pluggable for both read and write path. Note > that Hadoop FileSystem provides an abstraction and we could ask alternative > log aggregator implement compatable FileSystem, but that seems to an overkill. > 2. Provide a log aggregation plugin that write to HBase. The scheme design > needs to support efficient read on a per application as well as per > application+container basis; in addition, it shouldn't create hotspot in a > cluster where certain users might create more jobs than others. For example, > we can use hash($user+$applicationId} + containerid as the row key. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10537) Change type of LogAggregationService threadPool
Xie YiFan created YARN-10537: Summary: Change type of LogAggregationService threadPool Key: YARN-10537 URL: https://issues.apache.org/jira/browse/YARN-10537 Project: Hadoop YARN Issue Type: Improvement Reporter: Xie YiFan Now, LogAggregationService threadPool is FixedThreadPool which of default threadPoolSize is 100. LogAggregationService will construct AppLogAggregator for new come application and submit to threadPool. AppLogAggregator do while loop unitl application finished. Some application may run very long time due to reason such as no enough resource or other. As result, it occupy one thread of threadPool. When this application size greater than threadPoolSize, the later short-live application can't upload logs until previous long-live application finished. So, i think we should replace FixedThreadPool to CachedThreadPool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10537) Change type of LogAggregationService threadPool
[ https://issues.apache.org/jira/browse/YARN-10537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xie YiFan updated YARN-10537: - Priority: Minor (was: Major) > Change type of LogAggregationService threadPool > --- > > Key: YARN-10537 > URL: https://issues.apache.org/jira/browse/YARN-10537 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Xie YiFan >Priority: Minor > > Now, LogAggregationService threadPool is FixedThreadPool which of default > threadPoolSize is 100. LogAggregationService will construct AppLogAggregator > for new come application and submit to threadPool. AppLogAggregator do while > loop unitl application finished. Some application may run very long time due > to reason such as no enough resource or other. As result, it occupy one > thread of threadPool. When this application size greater than threadPoolSize, > the later short-live application can't upload logs until previous long-live > application finished. So, i think we should replace FixedThreadPool to > CachedThreadPool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6539) Create SecureLogin inside Router
[ https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xie YiFan updated YARN-6539: Attachment: YARN-6539.008.patch > Create SecureLogin inside Router > > > Key: YARN-6539 > URL: https://issues.apache.org/jira/browse/YARN-6539 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Xie YiFan >Priority: Minor > Attachments: YARN-6359_1.patch, YARN-6359_2.patch, > YARN-6539-branch-3.1.0.004.patch, YARN-6539-branch-3.1.0.005.patch, > YARN-6539.006.patch, YARN-6539.007.patch, YARN-6539.008.patch, > YARN-6539_3.patch, YARN-6539_4.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10315) Avoid sending RMNodeResoureupdate event if resource is same
[ https://issues.apache.org/jira/browse/YARN-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xie YiFan reassigned YARN-10315: Assignee: (was: Xie YiFan) > Avoid sending RMNodeResoureupdate event if resource is same > --- > > Key: YARN-10315 > URL: https://issues.apache.org/jira/browse/YARN-10315 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin Chundatt >Priority: Major > > When the node is in DECOMMISSIONING state the RMNodeResourceUpdateEvent is > send for every heartbeat . Which will result in scheduler resource update. > Avoid sending the same. > Scheduler node resource update iterates through all the queues for resource > update which is costly.. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10315) Avoid sending RMNodeResoureupdate event if resource is same
[ https://issues.apache.org/jira/browse/YARN-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xie YiFan reassigned YARN-10315: Assignee: Xie YiFan (was: Sushil Ks) > Avoid sending RMNodeResoureupdate event if resource is same > --- > > Key: YARN-10315 > URL: https://issues.apache.org/jira/browse/YARN-10315 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin Chundatt >Assignee: Xie YiFan >Priority: Major > > When the node is in DECOMMISSIONING state the RMNodeResourceUpdateEvent is > send for every heartbeat . Which will result in scheduler resource update. > Avoid sending the same. > Scheduler node resource update iterates through all the queues for resource > update which is costly.. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6539) Create SecureLogin inside Router
[ https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xie YiFan updated YARN-6539: Attachment: YARN-6539.007.patch > Create SecureLogin inside Router > > > Key: YARN-6539 > URL: https://issues.apache.org/jira/browse/YARN-6539 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Xie YiFan >Priority: Minor > Attachments: YARN-6359_1.patch, YARN-6359_2.patch, > YARN-6539-branch-3.1.0.004.patch, YARN-6539-branch-3.1.0.005.patch, > YARN-6539.006.patch, YARN-6539.007.patch, YARN-6539_3.patch, YARN-6539_4.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6539) Create SecureLogin inside Router
[ https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xie YiFan updated YARN-6539: Attachment: YARN-6539.006.patch > Create SecureLogin inside Router > > > Key: YARN-6539 > URL: https://issues.apache.org/jira/browse/YARN-6539 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Xie YiFan >Priority: Minor > Attachments: YARN-6359_1.patch, YARN-6359_2.patch, > YARN-6539-branch-3.1.0.004.patch, YARN-6539-branch-3.1.0.005.patch, > YARN-6539.006.patch, YARN-6539_3.patch, YARN-6539_4.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6539) Create SecureLogin inside Router
[ https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xie YiFan updated YARN-6539: Attachment: YARN-6539-branch-3.1.0.005.patch > Create SecureLogin inside Router > > > Key: YARN-6539 > URL: https://issues.apache.org/jira/browse/YARN-6539 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Xie YiFan >Priority: Minor > Attachments: YARN-6359_1.patch, YARN-6359_2.patch, > YARN-6539-branch-3.1.0.004.patch, YARN-6539-branch-3.1.0.005.patch, > YARN-6539_3.patch, YARN-6539_4.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6539) Create SecureLogin inside Router
[ https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xie YiFan updated YARN-6539: Attachment: YARN-6539-branch-3.1.0.004.patch > Create SecureLogin inside Router > > > Key: YARN-6539 > URL: https://issues.apache.org/jira/browse/YARN-6539 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Xie YiFan >Priority: Minor > Attachments: YARN-6359_1.patch, YARN-6359_2.patch, > YARN-6539-branch-3.1.0.004.patch, YARN-6539_3.patch, YARN-6539_4.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6539) Create SecureLogin inside Router
[ https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xie YiFan updated YARN-6539: Attachment: YARN-6539_4.patch > Create SecureLogin inside Router > > > Key: YARN-6539 > URL: https://issues.apache.org/jira/browse/YARN-6539 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Xie YiFan >Priority: Minor > Attachments: YARN-6359_1.patch, YARN-6359_2.patch, YARN-6539_3.patch, > YARN-6539_4.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6539) Create SecureLogin inside Router
[ https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113693#comment-17113693 ] Xie YiFan commented on YARN-6539: - [~BilwaST] ok. I will try it. > Create SecureLogin inside Router > > > Key: YARN-6539 > URL: https://issues.apache.org/jira/browse/YARN-6539 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Xie YiFan >Priority: Minor > Attachments: YARN-6359_1.patch, YARN-6359_2.patch, YARN-6539_3.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9811) FederationInterceptor fails to recover in Kerberos environment
Xie YiFan created YARN-9811: --- Summary: FederationInterceptor fails to recover in Kerberos environment Key: YARN-9811 URL: https://issues.apache.org/jira/browse/YARN-9811 Project: Hadoop YARN Issue Type: Bug Components: amrmproxy Reporter: Xie YiFan Assignee: Xie YiFan *scenario*: Start up cluster in Kerberos environment with enable recover & AMRMProxy in NM. Submit one application to cluster, and restart NM which has master container. The NM will block in FederationInterceptor recover. *LOG* {code:java} INFO org.apache.hadoop.yarn.server.nodemanager.amrmproxy.FederationInterceptor: Recovering data for FederationInterceptor INFO org.apache.hadoop.yarn.server.nodemanager.amrmproxy.FederationInterceptor: Found 0 existing UAMs for application application_1561534175896_4102 in NMStateStore INFO org.apache.hadoop.yarn.server.utils.AMRMClientUtils: Creating RMProxy to RM online-bx for protocol ApplicationClientProtocol for user recommend (auth:SIMPLE) INFO org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider: Initialized Federation proxy for user: recommend INFO org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider: Failing over to the ResourceManager for SubClusterId: online-bx INFO org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider: Connecting to /10.88.86.142:8032 subClusterId online-bx with protocol ApplicationClientProtocol as user recommend (auth:SIMPLE) WARN org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] INFO org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider: Failing over to the ResourceManager for SubClusterId: online-bx INFO org.apache.hadoop.yarn.server.federation.utils.FederationStateStoreFacade: Flushing subClusters from cache and rehydrating from store, most likely on account of RM failover. INFO org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider: Connecting to /10.88.86.142:8032 subClusterId online-bx with protocol ApplicationClientProtocol as user recommend (auth:SIMPLE) WARN org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] INFO org.apache.hadoop.io.retry.RetryInvocationHandler: java.io.IOException: DestHost:destPort hadoop1684.bx.momo.com:8032 , LocalHost:localPort hadoop999.bx.momo.com/10.88.64.186:0. Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS], while invoking ApplicationClientProtocolPBClientImpl.getContainers over online-bx after 1 failover attempts. Trying to failover after sleeping for 3244ms.{code} *Analysis* rmclient.getContainers is called. But AuthMethod of appSubmitter is SIMPLE.We should use createProxyUser instead of createRemoteUser in Security. {code:java} UserGroupInformation appSubmitter = UserGroupInformation .createRemoteUser(getApplicationContext().getUser()); ApplicationClientProtocol rmClient = createHomeRMProxy(getApplicationContext(), ApplicationClientProtocol.class, appSubmitter); GetContainersResponse response = rmClient .getContainers(GetContainersRequest.newInstance(this.attemptId)); {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9803) NPE while accessing Scheduler UI
[ https://issues.apache.org/jira/browse/YARN-9803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xie YiFan updated YARN-9803: Attachment: YARN-9803-branch-3.1.1.001.patch > NPE while accessing Scheduler UI > > > Key: YARN-9803 > URL: https://issues.apache.org/jira/browse/YARN-9803 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Xie YiFan >Assignee: Xie YiFan >Priority: Major > Attachments: YARN-9803-branch-3.1.1.001.patch > > > The same with what described in YARN-4624 > Scenario: > === > if not configure all queue's capacity to nodelabel even the value is 0, start > cluster and access capacityscheduler page. > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:108) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:97) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:243) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43) > at > org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) > at > org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$LI.__(Hamlet.java:7709) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:342) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:243) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43) > at > org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) > at > org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$LI.__(Hamlet.java:7709) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:513) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:243) > at > org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) > at > org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848) > at > org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) > at > org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) > at > org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.scheduler(RmController.java:86) > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9803) NPE while accessing Scheduler UI
[ https://issues.apache.org/jira/browse/YARN-9803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919146#comment-16919146 ] Xie YiFan commented on YARN-9803: - Because of varible configuredMinResource is not initialized in PartitionQueueCapacitiesInfo > NPE while accessing Scheduler UI > > > Key: YARN-9803 > URL: https://issues.apache.org/jira/browse/YARN-9803 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Xie YiFan >Assignee: Xie YiFan >Priority: Major > > The same with what described in YARN-4624 > Scenario: > === > if not configure all queue's capacity to nodelabel even the value is 0, start > cluster and access capacityscheduler page. > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:108) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:97) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:243) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43) > at > org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) > at > org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$LI.__(Hamlet.java:7709) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:342) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:243) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43) > at > org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) > at > org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$LI.__(Hamlet.java:7709) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:513) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:243) > at > org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) > at > org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848) > at > org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) > at > org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) > at > org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.scheduler(RmController.java:86) > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9803) NPE while accessing Scheduler UI
[ https://issues.apache.org/jira/browse/YARN-9803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xie YiFan updated YARN-9803: Description: The same with what described in YARN-4624 Scenario: === if not configure all queue's capacity to nodelabel even the value is 0, start cluster and access capacityscheduler page. Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163) at org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:108) at org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:97) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) at org.apache.hadoop.yarn.webapp.View.render(View.java:243) at org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43) at org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$LI.__(Hamlet.java:7709) at org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:342) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) at org.apache.hadoop.yarn.webapp.View.render(View.java:243) at org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43) at org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$LI.__(Hamlet.java:7709) at org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:513) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) at org.apache.hadoop.yarn.webapp.View.render(View.java:243) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.scheduler(RmController.java:86) was: The same with what described in YARN-4624 Scenario: === if not configure all queue's capacity to nodelabel even the value is 0, start cluster and access capacityscheduler page. org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI: /cluster/scheduler java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor124.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:162) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:178) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) at com.google.inject.servlet.GuiceFilter$1.call(GuiceF
[jira] [Created] (YARN-9803) NPE while accessing Scheduler UI
Xie YiFan created YARN-9803: --- Summary: NPE while accessing Scheduler UI Key: YARN-9803 URL: https://issues.apache.org/jira/browse/YARN-9803 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.1.1 Reporter: Xie YiFan Assignee: Xie YiFan The same with what described in YARN-4624 Scenario: === if not configure all queue's capacity to nodelabel even the value is 0, start cluster and access capacityscheduler page. org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI: /cluster/scheduler java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor124.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:162) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:178) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130) at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:110) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:304) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) at org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.security.http.MOMOHttpAuthenticationFilter.doFilter(MOMOHttpAuthenticationFilter.java:160) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1613) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.S
[jira] [Commented] (YARN-6539) Create SecureLogin inside Router
[ https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908885#comment-16908885 ] Xie YiFan commented on YARN-6539: - hi [~yzzjjyy] You should set hadoop.security.authorization to false in core-site.xml. > Create SecureLogin inside Router > > > Key: YARN-6539 > URL: https://issues.apache.org/jira/browse/YARN-6539 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Xie YiFan >Priority: Minor > Attachments: YARN-6359_1.patch, YARN-6359_2.patch, YARN-6539_3.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6539) Create SecureLogin inside Router
[ https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904992#comment-16904992 ] Xie YiFan commented on YARN-6539: - [~subru], I cant' find any test that was related to RM and NM secureLogin. Also, I think it's hard to add a test, because testing it requires kerberos environment. My implementation: 1.Call SecurityUtils#login in secureLogin to enable Router login with kerberos like RM & NM does. 2.RouterClientRMService receives the request from the YarnClient and creates the FederationClientInterceptor, initializes the UGI based on the user. Next,FederationClientInterceptor forward it to RM. FederationClientInterceptor constructs clientRMProxy to send RPC requests to RM using previously initialized UGI. AbstractClientRequestInterceptor calls UserGroupInformation#createProxyUser to construct UGI in setupUser, in other word, use Router’s Kerberos identity to proxy the current user. > Create SecureLogin inside Router > > > Key: YARN-6539 > URL: https://issues.apache.org/jira/browse/YARN-6539 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Xie YiFan >Priority: Minor > Attachments: YARN-6359_1.patch, YARN-6359_2.patch, YARN-6539_3.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6539) Create SecureLogin inside Router
[ https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901638#comment-16901638 ] Xie YiFan commented on YARN-6539: - [~subru] Could you review this patch for me? > Create SecureLogin inside Router > > > Key: YARN-6539 > URL: https://issues.apache.org/jira/browse/YARN-6539 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Xie YiFan >Priority: Minor > Attachments: YARN-6359_1.patch, YARN-6359_2.patch, YARN-6539_3.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6539) Create SecureLogin inside Router
[ https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xie YiFan updated YARN-6539: Attachment: YARN-6539_3.patch > Create SecureLogin inside Router > > > Key: YARN-6539 > URL: https://issues.apache.org/jira/browse/YARN-6539 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Xie YiFan >Priority: Minor > Attachments: YARN-6359_1.patch, YARN-6359_2.patch, YARN-6539_3.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6539) Create SecureLogin inside Router
[ https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xie YiFan updated YARN-6539: Attachment: YARN-6359_2.patch > Create SecureLogin inside Router > > > Key: YARN-6539 > URL: https://issues.apache.org/jira/browse/YARN-6539 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Xie YiFan >Priority: Minor > Attachments: YARN-6359_1.patch, YARN-6359_2.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6539) Create SecureLogin inside Router
[ https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894859#comment-16894859 ] Xie YiFan commented on YARN-6539: - This patch doesn't work.I have complete this function.[~shenyinjie]Would you mind let me solve this issue? > Create SecureLogin inside Router > > > Key: YARN-6539 > URL: https://issues.apache.org/jira/browse/YARN-6539 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Shen Yinjie >Priority: Minor > Attachments: YARN-6359_1.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9708) Add Yarnclient#getDelegationToken API implementation and SecureLogin in router
Xie YiFan created YARN-9708: --- Summary: Add Yarnclient#getDelegationToken API implementation and SecureLogin in router Key: YARN-9708 URL: https://issues.apache.org/jira/browse/YARN-9708 Project: Hadoop YARN Issue Type: New Feature Components: router Affects Versions: 3.1.1 Reporter: Xie YiFan Attachments: Add_getDelegationToken_and_SecureLogin_in_router.patch 1.we use router as proxy to manage multiple cluster which be independent of each other in order to apply unified client. Thus, we implement our customized AMRMProxyPolicy that doesn't broadcast ResourceRequest to other cluster. 2.Our production environment need kerberos. But router doesn't support SecureLogin for now. https://issues.apache.org/jira/browse/YARN-6539 desn't work. So we improvement it. 3.Some framework like oozie would get Token via yarnclient#getDelegationToken which router doesn't support. Our solution is that adding homeCluster to ApplicationSubmissionContextProto & GetDelegationTokenRequestProto. Job would be submitted with specified clusterid so that router knows which cluster to submit this job. Router would get Token from one RM according to specified clusterid when client call getDelegation meanwhile apply some mechanism to save this token in memory. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org