there is a first pass query to retrieve all matching document ids from every shard along with relevant sorting information, the document ids are then sorted and limited to the amount needed, then a second query is sent for the rest of the documents metadata.
On Sun, Jun 27, 2010 at 7:32 PM, Babak Farhang <farh...@gmail.com> wrote: > Otis, > > Belated thanks for your reply. > >>> 2. "The index could change between stages, e.g. a >>> document that matched a >>> query and was subsequently changed may no >>> longer match but will still be >>> retrieved." > >> 2. This describes the situation where, for instance, a >> document with ID=10 is updated between the 2 calls >> to the Solr instance/shard where that doc ID=10 lives. > > Can you explain why this happens? (I.e. does each query to the sharded > index somehow involve 2 calls to each shard instance from the base > instance?) > > -Babak > > On Thu, Jun 24, 2010 at 10:14 PM, Otis Gospodnetic > <otis_gospodne...@yahoo.com> wrote: >> Hi Babak, >> >> 1. Yes, you are reading that correctly. >> >> 2. This describes the situation where, for instance, a document with ID=10 >> is updated between the 2 calls to the Solr instance/shard where that doc >> ID=10 lives. >> >> 3. Yup, orthogonal. You can have a master with multiple cores for sharded >> and non-sharded indices and you can have a slave with cores that hold >> complete indices or just their shards. >> Otis >> ---- >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch >> Lucene ecosystem search :: http://search-lucene.com/ >> >> >> >> ----- Original Message ---- >>> From: Babak Farhang <farh...@gmail.com> >>> To: solr-user@lucene.apache.org >>> Sent: Thu, June 24, 2010 6:32:54 PM >>> Subject: questions about Solr shards >>> >>> Hi everyone, >> >> There are a couple of notes on the limitations of this >>> approach at >> >>> target=_blank >http://wiki.apache.org/solr/DistributedSearch which I'm >>> having trouble >> understanding. >> >> 1. "When duplicate doc IDs are received, >>> Solr chooses the first doc >> and discards subsequent >>> ones" >> >> "Received" here is from the perspective of the base Solr instance >>> at >> query time, right? I.e. if you inadvertently indexed 2 versions >>> of >> the document with the same unique ID but different contents to >>> 2 >> shards, then at query time, the "first" document (putting aside for >> the >>> moment what exactly "first" means) would win. Am I reading >>> this >> right? >> >> >> 2. "The index could change between stages, e.g. a >>> document that matched a >> query and was subsequently changed may no >>> longer match but will still be >> retrieved." >> >> I have no idea what >>> this second statement means. >> >> >> And one other question about >>> shards: >> >> 3. The examples I've seen documented do not illustrate >>> sharded, >> multicore setups; only sharded monolithic cores. I assume >>> sharding >> works with multicore as well (i.e. the two issues are >>> orthogonal). Is >> this right? >> >> >> Any help on interpreting the >>> above would be much appreciated. >> >> Thank you, >> -Babak >> >