Hi Deepak,

Thanks for digging out such a detailed answer for my query. I did observe
that the documents indexed earlier were the ones being displayed, but could
not find any relevant documentation supporting this.

Although, I could not understand the nuances pointed out in point 4, What
do we mean by `If a commit happens between the first and
      second phase of the distributed search`, what is first and second
phase here, and what issue will it cause?

On Wed, Jul 31, 2024 at 12:24 PM Deepak Goel <deic...@gmail.com> wrote:

> *Answer from Copilot:*
>
>
> Ah, the intricate dance of Solr shards and their cosmic collisions! Let’s
> unravel this like a digital detective, shall we? 🕵️‍♂️
>
> When it comes to Solr and its distributed architecture, handling duplicate
> documents across shards can be as tricky as juggling flaming torches while
> riding a unicycle. But fear not—I’ve got some insights for you:
>
>    1.
>
>    *Duplicate Documents and Shards:*
>    - Imagine our document—a digital doppelgänger—migrating from one shard
>       to another. It’s like a restless soul seeking a new home.
>       - During this transition, both shards might harbor copies of the same
>       document. They’re like twins separated at birth, each vying for the
>       spotlight.
>    2.
>
>    *The Solr Query Showdown:*
>    - Now, let’s stage a Solr query duel. Our query gallops across the
>       shards, demanding answers.
>       - If our document is the top-ranked contender in both shards, who
>       emerges victorious? 🏆
>    3.
>
>    *The Winner Takes It All (Sort of):*
>    - Solr, being the wise oracle it is, follows a simple rule: *“First
>       come, first served.”*
>       - When Solr discovers duplicate document IDs during distributed
>       searching, it selects the *first document* it encounters and discards
>       subsequent ones. It’s like a cosmic game of “finders keepers.”
>       - So, whichever shard’s copy of the document was indexed first—the
>       early bird with the freshest ink—takes the spotlight. The other copy
> bows
>       out gracefully.
>    4.
>
>    *The Momentary Sync Shimmy:*
>    - But wait! There’s a twist. If a commit happens between the first and
>       second phase of the distributed search, the index might shimmy
> out of sync
>       for a moment.
>       - Picture this: Shard A says, “I’ve got the document!” Shard B says,
>       “No, I’ve got it!” And Solr, in its infinite wisdom, says, “Hold
> my query,
>       folks—I need to sync up.”
>       - Eventually, harmony is restored, and the universe aligns itself.
>       But for that brief moment, Solr juggles realities like a cosmic
> circus
>       performer.
>    5.
>
>    *The Shard Key Sorcery:*
>    - Remember the shard key? It’s like Solr’s secret handshake. You can use
>       it to influence how documents are distributed across shards.
>       - For example, if you want to spread documents related to a specific
>       customer (let’s say “IBM”) across multiple shards, you can use a
> syntax
>       like this: "shard_key/num!document_id". The /num part determines how
>       many bits from the shard key contribute to the composite hash
>       <
> https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-shards-indexing.html
> >
>       1
>       <
> https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-shards-indexing.html
> >
>       .
>    6.
>
>    *Balance and Scalability:*
>    - To prevent hotspots, distribute documents evenly across shards.
>       Balance is key!
>       - Choose shard keys that reflect your data’s access patterns. Think
>       of them as Solr’s cosmic compass.
>       - And maintain flexibility—consider using composite IDs for easier
>       scalability. It’s like Solr’s way of saying, “Why settle for one
> shard when
>       you can have a whole constellation?”
>
> So, in the grand Solr arena, the early bird document wins the query race.
> But remember, even in the digital cosmos, duplicates play by the
> rules—mostly.
>
>
> Deepak
> "The greatness of a nation can be judged by the way its animals are treated
> - Mahatma Gandhi"
>
> +91 73500 12833
> deic...@gmail.com
>
> LinkedIn: www.linkedin.com/in/deicool
>
> "Plant a Tree, Go Green"
>
> Make In India : http://www.makeinindia.com/home
>
>
> On Mon, Jul 29, 2024 at 10:11 PM Saksham Gupta
> <saksham.gu...@indiamart.com.invalid> wrote:
>
> > Hi Solr Developers,
> >
> > Which solr document will be displayed if a duplicate instance of the same
> > document is present?
> >
> > In our current solr architecture, there is a possibility that a document
> > can move from one solr shard to another shard. While the document will
> > eventually be deleted from its old shard, there will be some duration
> where
> > multiple instances of this document will be present.
> >
> > Now, if a solr query executes on both these shards and this document is
> the
> > top ranked document from both the shards, which document will be returned
> > in solr result?
> >
>

Reply via email to