Hi Deepak, Thanks for digging out such a detailed answer for my query. I did observe that the documents indexed earlier were the ones being displayed, but could not find any relevant documentation supporting this.
Although, I could not understand the nuances pointed out in point 4, What do we mean by `If a commit happens between the first and second phase of the distributed search`, what is first and second phase here, and what issue will it cause? On Wed, Jul 31, 2024 at 12:24 PM Deepak Goel <deic...@gmail.com> wrote: > *Answer from Copilot:* > > > Ah, the intricate dance of Solr shards and their cosmic collisions! Let’s > unravel this like a digital detective, shall we? 🕵️♂️ > > When it comes to Solr and its distributed architecture, handling duplicate > documents across shards can be as tricky as juggling flaming torches while > riding a unicycle. But fear not—I’ve got some insights for you: > > 1. > > *Duplicate Documents and Shards:* > - Imagine our document—a digital doppelgänger—migrating from one shard > to another. It’s like a restless soul seeking a new home. > - During this transition, both shards might harbor copies of the same > document. They’re like twins separated at birth, each vying for the > spotlight. > 2. > > *The Solr Query Showdown:* > - Now, let’s stage a Solr query duel. Our query gallops across the > shards, demanding answers. > - If our document is the top-ranked contender in both shards, who > emerges victorious? 🏆 > 3. > > *The Winner Takes It All (Sort of):* > - Solr, being the wise oracle it is, follows a simple rule: *“First > come, first served.”* > - When Solr discovers duplicate document IDs during distributed > searching, it selects the *first document* it encounters and discards > subsequent ones. It’s like a cosmic game of “finders keepers.” > - So, whichever shard’s copy of the document was indexed first—the > early bird with the freshest ink—takes the spotlight. The other copy > bows > out gracefully. > 4. > > *The Momentary Sync Shimmy:* > - But wait! There’s a twist. If a commit happens between the first and > second phase of the distributed search, the index might shimmy > out of sync > for a moment. > - Picture this: Shard A says, “I’ve got the document!” Shard B says, > “No, I’ve got it!” And Solr, in its infinite wisdom, says, “Hold > my query, > folks—I need to sync up.” > - Eventually, harmony is restored, and the universe aligns itself. > But for that brief moment, Solr juggles realities like a cosmic > circus > performer. > 5. > > *The Shard Key Sorcery:* > - Remember the shard key? It’s like Solr’s secret handshake. You can use > it to influence how documents are distributed across shards. > - For example, if you want to spread documents related to a specific > customer (let’s say “IBM”) across multiple shards, you can use a > syntax > like this: "shard_key/num!document_id". The /num part determines how > many bits from the shard key contribute to the composite hash > < > https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-shards-indexing.html > > > 1 > < > https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-shards-indexing.html > > > . > 6. > > *Balance and Scalability:* > - To prevent hotspots, distribute documents evenly across shards. > Balance is key! > - Choose shard keys that reflect your data’s access patterns. Think > of them as Solr’s cosmic compass. > - And maintain flexibility—consider using composite IDs for easier > scalability. It’s like Solr’s way of saying, “Why settle for one > shard when > you can have a whole constellation?” > > So, in the grand Solr arena, the early bird document wins the query race. > But remember, even in the digital cosmos, duplicates play by the > rules—mostly. > > > Deepak > "The greatness of a nation can be judged by the way its animals are treated > - Mahatma Gandhi" > > +91 73500 12833 > deic...@gmail.com > > LinkedIn: www.linkedin.com/in/deicool > > "Plant a Tree, Go Green" > > Make In India : http://www.makeinindia.com/home > > > On Mon, Jul 29, 2024 at 10:11 PM Saksham Gupta > <saksham.gu...@indiamart.com.invalid> wrote: > > > Hi Solr Developers, > > > > Which solr document will be displayed if a duplicate instance of the same > > document is present? > > > > In our current solr architecture, there is a possibility that a document > > can move from one solr shard to another shard. While the document will > > eventually be deleted from its old shard, there will be some duration > where > > multiple instances of this document will be present. > > > > Now, if a solr query executes on both these shards and this document is > the > > top ranked document from both the shards, which document will be returned > > in solr result? > > >