What I seem to see suggested here is to use different cores for the things you 
suggested:
  different types of documents
  Access Control Lists

I wonder how sharding would work in that scenario?

Me, I plan on :
  For security:
    Using a permissions field
  For different schmas:
    Dynamic fields with enough premade fields to handle it.


The one thing I don't thing my approach does well with is statistics.

 Dennis Gearon



----- Original Message ----
From: Jonathan Rochkind <rochk...@jhu.edu>
To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
Cc: supersoft <elarab...@gmail.com>
Sent: Mon, January 10, 2011 1:08:00 PM
Subject: Re: Improving Solr performance

I see a lot of people using shards to hold "different types of documents", and 
it almost always seems to be a bad solution. Shards are intended for 
distributing a large index over multiple hosts -- that's it.  Not for some kind 
of federated search over multiple schemas, not for access control.

Why not put everything in the same index, without shards, and just use an 'fq' 
limit in order to limit to the specific document you'd like to search over in a 
given search?    I think that would achieve your goal a lot more simply than 
shards -- then you use sharding only if and when your index grows to be so 
large 
you'd like to distribute it over multiple hosts, and when you do so you choose 
a 
shard key that will have more or less equal distribution accross shards.

Using shards for access control or schema management just leads to headaches.

[Apparently Solr could use some highlighted documentation on what shards are 
really for, as it seems to be a very common issue on this list, someone trying 
to use them for something else and then inevitably finding problems with that 
approach.]

Jonathan

On 1/7/2011 6:48 AM, supersoft wrote:
> The reason of this distribution is the kind of the documents. In spite of
> having the same schema structure (and solr conf), a document belongs to 1 of
> 5 different kinds.
> 
> Each kind corresponds to a concrete shard and due to this, the implemented
> client tool avoids searching in all the shards when the users selects just
> one or a few of kinds. The tool runs a multisharded query of the proper
> shards. I guess this is a right approach but correct me if I am wrong.
> 
> The real problem of this architecture is the correlation between concurrent
> users and response time:
> 1 query: n seconds
> 2 queries: 2*n second each query
> 3 queries: 3*n seconds each query
> and so...
> 
> This is being a real headache because 1 single query has an acceptable
> response time but when many users are accessing to the server the
> performance goes hardly down.

Reply via email to