-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Chris and I have been wrangling this afternoon over the way the Pyramid security machinery interacts with SQL-based user / group stores. As a result, I did a little investigation, and wanted to share the results.
Rationale - --------- The current "stock" pyramid authentiation policies take a callback which is used from two points in the framework / application: - - The 'authenticated_userid' API calls the callback to determine whether a user with authenticated credentials still exists in the persistent store; if the callback returns None, the user is no longer present. Note that the pyramid framework itself does not call 'authenticated_userid', but most apps will end up doing so. The API could also be called by stuff like view or route predicates. - - The 'effective_principals' API calls the callback to compute the set of groups to which the user belongs. Here, in addition to returning None for no-longer-in-the-building users, the callback is expected to return a sequence of "groups" (typically strings, but could be other stuff, depending on the application). These groups are stitched into a sequence of princiapls, including also the userid and some special pseudo-users. For developers who use SQL to store their user and group data, this design pretty much mandates that four SQL requests get made for each protected / authenticated request: the framework will call the callback once via 'effective_principals' (for any "protected" resource), while the application will often call 'authenticated_userid' (maybe more than once!) Since the callback doesn't know which API called it, it has to do both parts of the job, which in the examples documented to date requires going over the wire for two SQL queries. Ideally, the policy object would cache the result of the query(ies) on the request. Howver, the policy object cannot safely cache the results returned from the callback on the request, because the contract allows the callback to return arbitrary objects, some of which might themselves hold references to the request (hmm, I guess it could at least cache the fact that the callback returned None safely). The callback might be able to arrange such caching itself, if it knew that the results were safe. Proposed Mitigation - ------------------- To help mitigate that without changing the design of the API, I'd like to propose that we document the following example SQL query, which allows making a single query to answer both questions: This query is made against a fairly typicial user-gruops schema, which I bashed togehter using sqlite338:: - ------------------------- %< -------------------------------- sqlite> .dump BEGIN TRANSACTION; CREATE TABLE users(id integer primary key, login varchar[256] unique, password varchar[256]); INSERT INTO "users" VALUES(1, 'phred', 'w1lm4'); INSERT INTO "users" VALUES(2, 'bharney', 'b3tty'); INSERT INTO "users" VALUES(3, 'pebbyls', 'd1n0'); INSERT INTO "users" VALUES(4, 'dhino', 'b0n3s'); INSERT INTO "users" VALUES(5, 'bambam', 'r0cks'); CREATE TABLE groups(id integer primary key, name varchar[256] unique not null); INSERT INTO "groups" VALUES(1, 'flintstones'); INSERT INTO "groups" VALUES(2, 'rubbles'); INSERT INTO "groups" VALUES(3, 'males'); INSERT INTO "groups" VALUES(4, 'females'); INSERT INTO "groups" VALUES(5, 'humans'); CREATE TABLE user_groups(user_id integer not null, group_id integer not null, primary key(user_id, group_id), foreign key(user_id) references users, foreign key(group_id) references groups); INSERT INTO "user_groups" VALUES(1, 1); INSERT INTO "user_groups" VALUES(2, 2); INSERT INTO "user_groups" VALUES(1, 3); INSERT INTO "user_groups" VALUES(1, 5); INSERT INTO "user_groups" VALUES(2, 3); INSERT INTO "user_groups" VALUES(2, 5); INSERT INTO "user_groups" VALUES(3, 1); INSERT INTO "user_groups" VALUES(3, 4); INSERT INTO "user_groups" VALUES(3, 5); INSERT INTO "user_groups" VALUES(5, 2); INSERT INTO "user_groups" VALUES(5, 3); INSERT INTO "user_groups" VALUES(5, 5); COMMIT; - ------------------------- %< -------------------------------- The query itself relies on left outer joins to ensure that valid user records show up in the result set, even if they have no groups:: - ------------------------- %< -------------------------------- sqlite> select g.name ...> from users u ...> left outer join user_groups ug on ug.user_id = u.id ...> left outer join groups g on ug.group_id = g.id ...> where u.id = 1; flintstones males humans sqlite> select g.name ...> from users u ...> left outer join user_groups ug on ug.user_id = u.id ...> left outer join groups g on ug.group_id = g.id ...> where u.id = 4; sqlite> select g.name ...> from users u ...> left outer join user_groups ug on ug.user_id = u.id ...> left outer join groups g on ug.group_id = g.id ...> where u.id =13; sqlite> - ------------------------- %< -------------------------------- The callback would issue this query, and return None if the result set is empty. If the result set has a single record, and the 'g.name' value is Null, then the callback returns an empty sequence. Otherwise, the callback returns the sequence of group names in the result set. Something like:: - ------------------------- %< -------------------------------- SQL = """\ select g.name from users u left outer join user_groups ug on ug.user_id = u.id left outer join groups g on ug.group_id = g.id where u.id = % """ def callback(request, userid): dbconn = request.registry.settings['dbconn'] groups = [x.name for x in dbconn.execute(SQL, userid).fetchall()] if groups: if groups == [None]: return () return groups return None - ------------------------- %< -------------------------------- I realize that many folks expect their ORM to do the SQL heavy lifting for them; unfortunately, I don't know any ORM well enough to suggest how to get this same effect. Note that this approach also sacrifices the convenience in the name of efficiency: a callback implemented this way doesn't make pre-populated "user object" available (or at least not easily). Summary - ------- Cutting the number of SQL queries by half helps some. Doing more requires either changing the contract for the callback (mandating that it return only items which can safely be cached on the request) or else making the callback responsible for such caching. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tsea...@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk0mW18ACgkQ+gerLs4ltQ54xwCgoTY+ajtC0EF6u8Bo/Wt+WYnV BG4AnR6AVK+2SAhf/3zkUPKZ/91Avc7M =quiW -----END PGP SIGNATURE----- _______________________________________________ Repoze-dev mailing list Repoze-dev@lists.repoze.org http://lists.repoze.org/listinfo/repoze-dev