Solr JOINs are a way to enforce simple document security, as explained
by Yonik Seeley at
http://lucene.472066.n3.nabble.com/document-level-security-filter-solution-for-Solr-tp4126992p4126994.html

I'm trying to tweak this pattern so that I don't have to keep the
security information in each of my primary Solr documents.

I just posted the gist at
https://gist.github.com/pdurbin/4d27fea7b431ef3bf4f9 as an example of
my working Solr JOIN based on data in `before.json` . Permissions per
user are embedded in the primary documents like this:

    {
        "id": "dataset_3",
        "perms_ss": [
            "alice",
            "bob"
        ]
    },
    {
        "id": "dataset_4",
        "perms_ss": [
            "alice",
            "bob",
            "public"
        ]
    },

User document have been created to do the JOIN on:

    {
        "id": "alice",
        "groups_s": "alice"
    },

The JOIN looks like this:

{!join+from=groups_s+to=perms_ss}id:public+OR+{!join+from=groups_s+to=perms_ss}id:alice

Because indexing the primary documents (datasets) takes a while, I'm
interested in exploring the idea of introducing a third type of
document that contains the permission information. `after.json` is an
example, with documents that look like this:

    {
        "id": "dataset_3"
    },
    {
        "id": "dataset_4"
    },
    {
        "id": "public",
        "groups_s": "public"
    },
    {
        "id": "alice",
        "groups_s": "alice"
    },
    {
        "id": "bob",
        "groups_s": "bob"
    },
    {
        "id": "charlie",
        "groups_s": "charlie"
    },
    {
        "id": "dataset_1_perms",
        "definition_point_s": "dataset_1",
        "role_assignee_ss": [
            "alice"
        ]
    },
    {
        "id": "dataset_2_perms",
        "definition_point_s": "dataset_2",
        "role_assignee_ss": [
            "bob"
        ]
    },

The question is if it's possible to construct a Solr JOIN such that
the same permissions are enforced and the same documents are returned
per user. The gist contains expected output and test runners for
anyone who can figure out the syntax of the JOIN. The idea is that
silence is golden and no output means the tests passed:

murphy:4d27fea7b431ef3bf4f9 pdurbin$ ./delete
{"responseHeader":{"status":0,"QTime":8}}
murphy:4d27fea7b431ef3bf4f9 pdurbin$ ./load.before
{"responseHeader":{"status":0,"QTime":12}}
murphy:4d27fea7b431ef3bf4f9 pdurbin$ ./test.before.all
murphy:4d27fea7b431ef3bf4f9 pdurbin$

What do people think? Can anyone load up "after.json", update the
FIXME's, and get `test.after.all` to work? Thanks in advance!

And thanks again for the original JOIN tip, Yonik!

Phil

-- 
Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin

Reply via email to