Here is a good presentation on search security from the Infonortics
Search Conference that was held a few weeks ago.
http://www.infonortics.com/searchengines/sh09/slides/kehoe.pdf
The approach you are using is called early-binding. As Jay mentioned,
one of the downsides is updating the documents each time you have an
ACL change. You could use the late-binding approach that checks each
result after the query but before you display to the user. I don't
recommend this approach because it will strain your security
infrastructure because you will need to check if the user can access
each result.
Good luck.
Thanks,
Matt Weber
eSr Technologies
http://www.esr-technologies.com
On May 12, 2009, at 1:21 PM, Jay Hill wrote:
The only downside would be that you would have to update a document
anytime
a user was granted or denied access. You would have to query before
the
update to get the current values for grantedUID and deniedUID,
remove/add
values, and update the index. If you don't have a lot of changes in
the
system that wouldn't be a big deal, but if a lot of changes are
happening
throughout the day you might have to queue requests and batch them.
-Jay
On Tue, May 12, 2009 at 1:05 PM, Matt Weber <m...@mattweber.org>
wrote:
I also work with the FAST Enterprise Search engine and this is
exactly how
their Security Access Module works. They actually use a modified
base-32
encoded value for indexing, but that is because they don't have the
luxury
of untokenized/un-processed String fields like Solr.
Thanks,
Matt Weber
eSr Technologies
http://www.esr-technologies.com
On May 12, 2009, at 12:26 PM, Terence Gannon wrote:
Paul -- thanks for the reply, I appreciate it. That's a very
practical
approach, and is worth taking a closer look at. Actually, taking
your
idea
one step further, perhaps three fields; 1) ownerUid (uid of the
document's
owner) 2) grantedUid (uid of users who have been granted access),
and 3)
deniedUid (uid of users specifically denied access to the document).
These
fields, coupled with some business rules around how they were
populated
should cover off all possibilities I think.
Access to the Solr instance would have to be tightly controlled, but
that's
something that should be done anyway. You sure wouldn't want end
users
preparing their own XML and throwing it at Solr -- it would be
pretty easy
to figure out how to get around the access/denied fields and get
at stuff
the owner didn't intend.
This approach mimics to some degree what is being done in the
operating
system, but it's still elegant and provides the level of control
required.
Anybody else have any thoughts in this regard? Has anybody
implemented
anything similar, and if so, how did it work? Thanks, and best
regards...
Terence