On Wed, 2012-04-04 at 19:52 +0200, Pavel Březina wrote:
> First of all I have never thought of the in-memory cache as a
> performance issue (we are only sorting them), but as a nice way to
> handle rules expiration.
> 
> I will try to explain why the sysdb is not enough in this case.
> 
> Because it is a security issue,

Ok Pavel, stop right there.
Explain exactly what would be the security issue and why sysdb implies
it while a second in-memory cache wouldn't.

>  when user runs SUDO, we want to always
> go to an LDAP server and download current rules - not all of them, just
> the ones that apply to him to minimize the traffic.

No we do not always want to, that's not necessarily the case, I can well
see admins decide that updating every X minutes is good enough, it's a
balance between performance and freshness of rules.

> But running SUDO multiple times in a short period of time is not an
> exception, therefore we cannot afford this approach because it would be
> too. It would make SUDO unusable.

Correct, a period of time within which sudo rules are considered valid
is quite ok.

> Therefore we need to implement expiration mechanism so we can reuse the
> rules in a short period (currently set to 180 seconds).

Why do we need a new expiration mechanism ? In sysdb we already store
"expiration time" for the entries normally. At least we do that for
users/groups. If it is not done for sudo rules it is an error, as you
wouldn't be able to asses rules validity on sssd process restart.

> The question is *how*?
> 
> 1. *Download every time all rules instead of per user*
>     This would be probably too costly to do, similar to users and groups
>     enumeration. There is already implemented an option that enables
>     periodical update of all rules.
> 
>     I admit I have never seen an enterprise sudoers. I assume that it
>     can get very big with hundreds of rules. If this is not a valid
>     assumption, just tell me that I am stupid and we can do it this
>     way :-)

sudo rules can get very large indeed, whether it is ok to enumerate them
or not is a different thing. Enumerating a few hudreds of rules is not a
big deal.
For users it is different because we expect up to hundreds of thousands
users/groups.

Perhaps we should have a settable threshold under which we do enumerate
frequently, over which we automatically stop doing that unless forced
to.

> 2. *Store the rules in sysdb per user*
>     There would be many data duplication and the database may get very
>     huge.

Why per-user ? Doesn't make sense to me.

> 3. Store the expiration time with each rule
>     *update only those rules that are expired*

I totally expect this to be done already, if it is not done I consider
it a bug.

>     We wouldn't be able to decide whether a new rule was created without
>     downloading all rules that apply for the user.

This assumption and logic is incorrect.
The assumption is incorrect because we already do smart updates for
users/groups enumeration by downloading only rules that were changed
since the last time they were enumerated. So you can easily download
only newer rules by using the changetime of the newest rule you have in
sysdb.
But here the logic seem also incorrect. The main issue I see is not much
in finding if new rules have been added. But in invalidating rules that
have been removed. However this could also be done in a smart way. First
refresh only for new rules. Then apply rule matching. For the handful of
rules that match the sudo request you check how old they are. For each
rule older than X minutes you do active validation by trying to fetch
them individually. This way you refresh only the matching rules.
If any rules is changed/missing you update/remove it from the evaluation
set.

>     (And we might be delivering to SUDO half of the updated rules and
>     half of the old rules.)

I do not see how this would happen.

> 4. Store the expiration time with each rule
>     *if one rule has different timestamp than others, perform an update*

I do not think this makes much sense, you want to be able to update only
individual rules, always fetching all rules that apply to a user seem a
waste of bandwidth and time spent waiting by the user.

>     Here we may end up in a not complete set of rules and a race
>     condition when the cache will not be used at all. Details are below
>     (*).
> 
> 5. *Storing set of rules per user in in-memory cache* (current approach)
> 
>     Once the rules that apply for the user are downloaded, they are
>     stored in the sysdb for the offline times but they are also stored
>     in a hash table. We set a tevent timer that will remove them from
>     the hash table after the expiration time exceeds.

What's the point here ?
You can achieve this exaclty same result by setting an expiration date
on each sysdb entry.

>     We will refresh the rules in case of in-memory cache miss.

You can do the same with sysdb rules, just consider any expired one as
missing.

>     There is a possible duplication of the rules in the memory, but only
>     for a short time and for a small amount of users (I don't expect
>     many administrators to be using sudo at one moment).

I wouldn't count on such assumptions, however I do not see how a
duplicated in-memory cache is helping here.
You can do exactly the same operations with exactly the same effects
using sysdb entries, so why is it not done that way ?

> 6. If there is some approach I have not thought of, please tell me.

See above.

> =======================================================================
> (*)
> 
> **Rules in LDAP**
> 
> cn=rule1
> sudoUser: A
> 
> cn=rule2
> sudoUser: B
> 
> cn=rule3
> sudoUser: A
> sudoUser: B
> 
> **Timeline**
> 
> *Timestamp: 0*
> user A does: sudo ls
> 
> Downloaded rules: rule1 and rule3
> Sysdb contains:
> 
> cn=rule1
> expire: 180
> 
> cn=rule3
> expire: 180
> 
> *Timestamp: 10*
> user B does: sudo ls
> 
> rule1 and rule3 are found in sysdb
> expiration time equals
> return them
> 
> !!! rule2 is missing !!!
> 
> *Timestamp: 190*
> user B does: sudo ls
> 
> rule3 in sysdb is expired
> perform an LDAP search for user B
> sysdb contains:
> 
> cn=rule1
> expire: 180
> 
> cn=rule2
> expire: 370
> 
> cn=rule3
> expire: 370
> 
> *Timestamp: 200*
> user A does: sudo ls
> 
> rule1 and rule3 have different expiration time
> perform an LDAP search for user A
> sysdb contains:
> 
> cn=rule1
> expire: 380
> 
> cn=rule2
> expire: 370
> 
> cn=rule3
> expire: 380
> 
> *Timestamp: 210*
> user B does: sudo ls
> 
> rule2 and rule3 have different expiration time
> perform an LDAP search for user B
> sysdb contains:
> 
> cn=rule1
> expire: 380
> 
> cn=rule2
> expire: 390
> 
> cn=rule3
> expire: 390
> 
> !!! always performing an LDAP search !!!

The error seem to me in trying to download rules per-user, I do not
think you can ever attain decent performances this way, and it will
always be fraught with errors. For example if sssd goes off line right
after the first step, then userB will always miss rules. This is not
expected, very difficult to explain to admins, and, frankly probably
unacceptable as it is very non-deterministic.

What you really want to do is to enumerate sudo rules that apply to the
machine. You effectively want to cache them all if possible and have a
reasonable tunable expiration time (5/10/15 minutes ...).
The reason is quite simple, sudo rules change very rarely, but you want
them all even when you are offline. You do not want to have a normal
user not have the sudo rules for his laptop handy only because he rarely
needs them.

Assume the case where sudo is used to run some VPN software only when
user X is out of the office. Now user X never used the VPN before and
therefore never needed to run sudo. Because you never downloaded rules
he goes home, opens the machine, runs the command to bring up the VPN
and ... it doesn't work.

I think the nature of sudo requires you to download all the rules that
apply to a specific machine. If we have a concern about the number of
rules, the admin can easily cut down the rules by properly restricting
the set of rule used on specific machines.

But in general I do not think it will be a big deal, I do not expect to
see cases where multiple thousands of rules that apply to the same host
are used. It would be unmanageable anyways.

So I think that the problem in the end is the approach where you decided
to download rules per-user instead of per-machine, it leads you into all
sort of corner cases.

The in-memory cache at this point is just a red-herring but I am glad I
dug in here as it uncovered a bigger design issue I didn't notice
before.

The in-memory cache keeps being an additional layer that should simply
be avoided, and becomes completely useless if you download rules
per-machine instead of per-user.

Simo.

-- 
Simo Sorce * Red Hat, Inc * New York

_______________________________________________
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/sssd-devel

Reply via email to