Hi,

Talk to the ManifoldCF guys - they have successfully implemented support for 
document level security for many repositories including CMC/ECMs and may have 
some hints for you to write your own Authority connector against your system, 
which will fetch the ACL for the document and index it with the document 
itself. This eliminates long query-time filters.

Re-indexing content for which ACLs have changed is a very common way of doing 
this, and you should not worry too much about performance implications before 
there is a real issue. In real world, you don't change folder permissions very 
often, and that will be a cost you'll have to live with. If you worry that this 
lag between repository state and index state may cause people to see content 
they are not entitled to, it is possible to do late binding filtering of the 
result set as well, but I would avoid that if possible.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 11. mars 2011, at 06.48, go canal wrote:

> To be fair, I think there is a slight difference between a Content Management 
> and a Search Engine.
> 
> Access control at per document level, per type level, supporting dynamic role 
> changes, etc.are more like  content management use cases; where search 
> solution 
> like Solr focuses on different set of use cases;
> 
> But in real world, any content management systems need full text search; so 
> the 
> question is to how to support search with permission control.
> 
> JackRabbit integrated with Lucene/Tika, this could be one solution but I do 
> not 
> know its performance and scalability;
> 
> CouchDB also integrates with Lucene/Tika, another option? 
> 
> I have yet to see a Search Engine that provides some sort of Content 
> Management 
> features like we are discussing here (Solr, Elastic Search ?)
> 
> 
> Then the last option is probably to build an application that works with a 
> document repository with all necessary content management features and Solr 
> which provides search capability;  and handling the permissions outside Solr?
> thanks,
> canal
> 
> 
> 
> 
> ________________________________
> From: Liam O'Boyle <liam.obo...@intelligencebank.com>
> To: solr-user@lucene.apache.org
> Cc: go canal <goca...@yahoo.com>
> Sent: Fri, March 11, 2011 2:28:19 PM
> Subject: Re: Solr and Permissions
> 
> As Canal points out,  grouping into types is not always possible.
> 
> In our case, permissions are not on a per-type level, but either on a per
> "folder" (of which there can be hundreds) or per item in some cases (of
> which there can be... any number at all).
> 
> Reindexing is also to slow to really be an option; some of the items use
> Tika to extract content, which means that we need to reextract the content
> (variable length of time; average is about half a second, but on some
> documents it will sit there until the connection times out) .  Querying it,
> modifying then resubmitting without rerunning content extraction is still
> faster, but involves sending even more data over the network; either way is
> relatively slow.
> 
> Liam
> 
> On 11 March 2011 16:24, go canal <goca...@yahoo.com> wrote:
> 
>> I have similar requirements.
>> 
>> Content type is one solution; but there are also other use cases where this
>> not
>> enough.
>> 
>> Another requirement is, when the access permission is changed, we need to
>> update
>> the field - my understanding is we can not unless re-index the whole
>> document
>> again. Am I correct?
>> thanks,
>> canal
>> 
>> 
>> 
>> 
>> ________________________________
>> From: Sujit Pal <sujit....@comcast.net>
>> To: solr-user@lucene.apache.org
>> Sent: Fri, March 11, 2011 10:39:27 AM
>> Subject: Re: Solr and Permissions
>> 
>> How about assigning content types to documents in the index, and map
>> users to a set of content types they are allowed to access? That way you
>> will pass in fewer parameters in the fq.
>> 
>> -sujit
>> 
>> On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:
>>> Morning,
>>> 
>>> We use solr to index a range of content to which, within our application,
>>> access is restricted by a system of user groups and permissions.  In
>> order
>>> to ensure that search results don't reveal information about items which
>> the
>>> user doesn't have access to, we need to somehow filter the results; this
>>> needs to be done within Solr itself, rather than after retrieval, so that
>>> the facet and result counts are correct.
>>> 
>>> Currently we do this by creating a filter query which specifies all of
>> the
>>> items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR
>> ...)),
>>> but this has definite scalability issues - we're starting to run into
>>> issues, as this can be a set of ORs of potentially unlimited size (and
>>> practically, we're hitting the low thousands sometimes).  While we can
>>> adjust maxBooleanClauses upwards, I understand that this has performance
>>> implications...
>>> 
>>> So, has anyone had to implement something similar in the past?  Any
>>> suggestions for a more scalable approach?  Any advice on safe and
>> sensible
>>> limits on how far I can push maxBooleanClauses?
>>> 
>>> Thanks for your advice,
>>> 
>>> Liam
>> 
>> 
>> 
>> 
> 
> 
> 
> -- 
> Liam O'Boyle
> 
> IntelligenceBank Pty Ltd
> Level 1, 31 Coventry Street Southbank, Victoria 3006, Australia
> P:   +613 8618 7810   F:   +613 8618 7899   M: +61 403 88 66 44
> 
> *Awarded 2010 "Best New Business" and "Business of the Year" - Business3000
> Awards*
> 
> This email and any attachments are confidential and may contain legally
> privileged information or copyright material. If you are not an intended
> recipient, please contact us at once by return email and then delete both
> messages. We do not accept liability in connection with transmission of
> information using the internet.
> 
> 
> 

Reply via email to