I really think you'll be in a world of hurt if you have the same
ID on different shards. I just wouldn't go there. The statement
"may be non-deterministic" should be taken to mean that this
is just unsupported.

Why is this the case? What is the use-case for putting the
same ID on different shard? Because this seems like
an  XY problem...

Best
Erick

On Wed, Feb 22, 2012 at 4:43 PM, jerry.min...@gmail.com
<jerry.min...@gmail.com> wrote:
> Hi,
>
> I stumbled across this thread after running into the same question. The
> answers presented here seem a little vague and I was hoping to renew the
> discussion.
>
> I am using using a branch of Solr 4, distributed searching over 12 shards.
> I want the documents in the first shard to always be selected over
> documents that appear in the other 11 shards.
>
> The queries to these shards looks something like this: "
> http://solrserver/shard_1_app/select?shards=solr_server:9999/shard_1_app/,solr_server:9999/shard_2_app,
> ... ,solr_server:9999/shard_12_app&q=id:xxxxxxxx"
>
> When I execute a query for an ID that I know exists in shard_1 and another
> shard, I do always get the result from shard 1.
>
> Here's some questions that I have:
> 1. Has anyone rigorously tested the comment in the wiki "If docs with
> duplicate unique keys are encountered, Solr will make an attempt to return
> valid results, but the behavior may be non-deterministic."
>
> 2. Who is relying on this behavior (the document of the first shard is
> returned) today? When do you notice the wrong document is selected? Do you
> have a feeling for how frequently your distributed search returns the
> document from a shard other than the first?
>
> 3. Is there a good web source other than the Solr wiki for information
> about Solr distributed queries?
>
>
> Thanks,
> Jerry M.
>
>
> On Mon, Aug 8, 2011 at 7:41 PM, simon <mtnes...@gmail.com> wrote:
>
>> I think the first one to respond is indeed the way it works, but
>> that's only deterministic up to a point (if your small index is in the
>> throes of a commit and everything required for a response happens to
>> be  cached on the larger shard ... who knows ?)
>>
>> On Mon, Aug 8, 2011 at 7:10 PM, Shawn Heisey <s...@elyograg.org> wrote:
>> > On 8/8/2011 4:07 PM, simon wrote:
>> >>
>> >> Only one should be returned, but it's non-deterministic. See
>> >>
>> >>
>> http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations
>> >
>> > I had heard it was based on which one responded first.  This is part of
>> why
>> > we have a small index that contains the newest content and only
>> distribute
>> > content to the other shards once a day.  The hope is that the small index
>> > (less than 1GB, fits into RAM on that virtual machine) will always
>> respond
>> > faster than the other larger shards (over 18GB each).  Is this an
>> incorrect
>> > assumption on our part?
>> >
>> > The build system does do everything it can to ensure that periods of
>> overlap
>> > are limited to the time it takes to commit a change across all of the
>> > shards, which should amount to just a few seconds once a day.  There
>> might
>> > be situations when the index gets out of whack and we have duplicate id
>> > values for a longer time period, but in practice it hasn't happened yet.
>> >
>> > Thanks,
>> > Shawn
>> >
>> >
>>

Reply via email to