RE: indexing documents (or pieces of a document) by access controls

Ard Schrijvers Wed, 13 Jun 2007 23:48:44 -0700

Hello,


> When I had those kind of problems (less complex) with lucene, 
> the only 
> idea was to filter from the front-end, according to the ACL policy. 
> Lucene docs and fields weren't protected, but tagged. Searching was 
> always applied with a field "audience", with hierarchical values like
> "public, reserved, protected, secret", so that a "public" 
> document has 
> the "secret" value also, to be found with a 
> "audience:secret", according 
> to the rights of the user who searchs. For the fields, the 
> not allowed 
> ones for some users where striped.

Yes I know this is a possibility...but we happen to want our authorisation 
facetted based. I am attacking the problem with keeping derived data from 
lucene in memory all translated into some byte/int values. The hardest part is 
keeping the derived data in sink with lucene *and* the different jackrabbit 
users (some have changes in there session but not yet saved their data)

Anyway, I can do facetted authorisation + counting in less than 20 ms for 
1.000.000 documents (normal pc) so hopefully I can succeed. I must admit OTH, 
that I did not find some sort of ingenious algorithm, but merely depend on the 
speed of the processor: doubling the number of documents means doubling the 
response time and needed memory (though 1.000.000 doc fitted in 25 Mb, so 
40.000.000 in a Gb...that is fine by me) 

> 
> May be you can have a look to the xmldb Exist ? The search engine, 
> xquery based, is not focused on the same goals as lucene, but I can 
> promise you that all queries will never return results from documents 
> you are not allowed to read.

I did not look at it, but my feeling is that it is not fast enough,

Regards Ard

> 
> 
> -- 
> Frédéric Glorieux
> École nationale des chartes
> direction des nouvelles technologies et de l'informatique
>

RE: indexing documents (or pieces of a document) by access controls

Reply via email to