What I seem to see suggested here is to use different cores for the things you suggested: different types of documents Access Control Lists
I wonder how sharding would work in that scenario? Me, I plan on : For security: Using a permissions field For different schmas: Dynamic fields with enough premade fields to handle it. The one thing I don't thing my approach does well with is statistics. Dennis Gearon ----- Original Message ---- From: Jonathan Rochkind <rochk...@jhu.edu> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> Cc: supersoft <elarab...@gmail.com> Sent: Mon, January 10, 2011 1:08:00 PM Subject: Re: Improving Solr performance I see a lot of people using shards to hold "different types of documents", and it almost always seems to be a bad solution. Shards are intended for distributing a large index over multiple hosts -- that's it. Not for some kind of federated search over multiple schemas, not for access control. Why not put everything in the same index, without shards, and just use an 'fq' limit in order to limit to the specific document you'd like to search over in a given search? I think that would achieve your goal a lot more simply than shards -- then you use sharding only if and when your index grows to be so large you'd like to distribute it over multiple hosts, and when you do so you choose a shard key that will have more or less equal distribution accross shards. Using shards for access control or schema management just leads to headaches. [Apparently Solr could use some highlighted documentation on what shards are really for, as it seems to be a very common issue on this list, someone trying to use them for something else and then inevitably finding problems with that approach.] Jonathan On 1/7/2011 6:48 AM, supersoft wrote: > The reason of this distribution is the kind of the documents. In spite of > having the same schema structure (and solr conf), a document belongs to 1 of > 5 different kinds. > > Each kind corresponds to a concrete shard and due to this, the implemented > client tool avoids searching in all the shards when the users selects just > one or a few of kinds. The tool runs a multisharded query of the proper > shards. I guess this is a right approach but correct me if I am wrong. > > The real problem of this architecture is the correlation between concurrent > users and response time: > 1 query: n seconds > 2 queries: 2*n second each query > 3 queries: 3*n seconds each query > and so... > > This is being a real headache because 1 single query has an acceptable > response time but when many users are accessing to the server the > performance goes hardly down.