[ 
https://issues.apache.org/jira/browse/YARN-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295634#comment-17295634
 ] 

Siddharth Ahuja commented on YARN-10652:
----------------------------------------

Thank you very much for the review [~snemeth]szilard, appreciate it.

Please find my comments below:

{quote}
 As Gergo said, we need to keep consistency. It's one thing that usernames with 
dots are kind of supported, but is it really supported in all parts of the 
system? Definitely not for placement rules as the rule Gergo mentioned 
("root.user.%user") could cause an issue easily. It's okay that some customers 
don't want to use placement rules and your change is not strictly related to 
placement rules. But if we are encouraging using usernames with dots across the 
codebase, we need to have handle these usernames in all aspects of the system. 
What if some customers are using usernames with dots and placement rules? There 
we have a problem, we need a more complete solution.
{quote}

There is no encouragement from my side :) Customer raised the issue themselves 
in regards to using this setting (which doesn't work) and hence this JIRA has 
been raised.

I understand yours and everyone else's concern about consistency in regards to 
using dots in usernames across YARN - perfectly valid. However, there is 
nothing stopping customers today from using usernames with dots for queue 
placement, regardless of the fix in this JIRA. Our software doesn't prevent it 
and the [Capacity Scheduler upstream 
documentation|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html]
 has zero mention about the lack of this support or the flakiness of this 
feature depending on where it is used. Meanwhile, if you look at [Fair 
Scheduler's upstream 
documentation|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html]
 (and s/w - see [1]), at least it talks about conversion of usernames & groups 
with a "." to "_dot_" even though it doesn't clearly say that it "supports" 
usernames with dots/periods, but it is implicit.

There are two ways to "discourage" customers from using usernames/groups with 
dots in them:
        1. Block all creation/use (including placement rules) of 
usernames/groups with dots until such time this feature is robust and fully 
available, and/or,
        2. Explicitly state in upstream documentation that there are known 
issues with Capacity Scheduler around usernames & groupnames with dots/periods, 
as such, it is strictly not recommended to work with them for the moment.

In absence of 1 and/or 2, *+there is nothing stopping customers from using this 
feature today+* and thus, leading to JIRAs like the one here. We should not 
leave customers in confusion or worse, let them use this functionality to their 
own peril. As such, please let me know if there is an easy way to achieve 1. 
from above (ideal solution) or at the very least go ahead with 2. - I can raise 
an upstream JIRA and update Capacity Scheduler documentation.

{quote}
Support for usernames with dots: Was this documented anywhere or is this fact 
only can be dig up from the codebase?
{quote}

As mentioned above, Fair Scheduler already supports it, kindly see [Fair 
Scheduler 
documentation|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html]
 and code at [1]. Usernames with dots are valid usernames in Linux, AD/LDAP.

        3. Further, it would be good to inform customers on how they should 
migrate their users with dots in them from FairScheduler to CapacityScheduler 
through some sort of documentation.

Please let me know what you think of 1, 2 and 3 [~snemeth].

[1] From YARN-2669 all user and group names will be passed through cleanName() 
(https://github.com/apache/hadoop/blob/a89ca56a1b0eb949f56e7c6c5c25fdf87914a02f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/FairQueuePlacementUtils.java#L53)
 which replaces the "." with a dot string.


> Capacity Scheduler fails to handle user weights for a user that has a "." 
> (dot) in it
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-10652
>                 URL: https://issues.apache.org/jira/browse/YARN-10652
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>    Affects Versions: 3.3.0
>            Reporter: Siddharth Ahuja
>            Assignee: Siddharth Ahuja
>            Priority: Major
>         Attachments: Correct user weight of 0.76 picked up for the user with 
> a dot after the patch.png, Incorrect default user weight of 1.0 being picked 
> for the user with a dot before the patch.png, YARN-10652.001.patch
>
>
> AD usernames can have a "." (dot) in them i.e. they can be of the format -> 
> {{firstname.lastname}}. However, if you specify a username with this format 
> against the Capacity Scheduler setting -> 
> {{yarn.scheduler.capacity.root.default.user-settings.firstname.lastname.weight}},
>  it fails to be applied and is instead assigned the default of 1.0f weight. 
> This renders the user weight feature (being used as a means of setting user 
> priorities for a queue) unusable for such users.
> This limitation comes from [1]. From [1], only word characters (A word 
> character: [a-zA-Z_0-9]) (see [2]) are permissible at the moment which is no 
> good for AD names that contain a "." (dot).
> Similar discussion has been had in a few HADOOP jiras e.g. HADOOP-7050 and 
> HADOOP-15395 and the outcome was to use non-whitespace characters i.e. 
> instead of {{\w+}}, use {{\S+}}.
> We could go down similar path and unblock this feature for the AD usernames 
> with a "." (dot) in them.
> [1] 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java#L1953
> [2] 
> https://docs.oracle.com/javase/tutorial/essential/regex/pre_char_classes.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to