Hello all, Situation: We have a collection of files in SOLR with ACL applied: each file has a multi-valued field that contains the list of userID's that can read it:
here is sample data: Id | content | userId 1 | text text | 4,5,6,2 2 | text text | 4,5,9 3 | text text | 4,2 Problem: when ACL is changed for a big folder, we compute the ACL for all child items and reindex in SOLR using atomic updates (updating only 'userIds' column), but because it deletes/reindexes the record, the performance is very poor. Question: I suppose the delete/reindex approach will not change soon (probably it's due to actual SOLR architecture), ? Possible solution: assuming atomic updates will be super fast on an index without fulltext, keep a separate ACLIndex and FullTextIndex and use Pseudo-Joins: Example: searching 'foo' as user '999' /solr/FullTextIndex/select/?q=foo&fq{!join fromIndex=ACLIndex from=Id to=Id }userId:999 Question: what about performance here? what if the index is 100,000 records? notice that the worst situation is when everyone has access to all the files, it means the first filter will be the full index. Would be happy to get any links that deal with the issue of Pseudo-join performance for large datasets (i.e. initial filtered set of IDs). Regards, Oleg P.S. we found that having the list of all users that have access for each record is better overall, because there are much more read requests (people accessing the library) then write requests (a new user is added/removed).